16 June 1995. NOAM CHOMSKY: ‘Minimalist Explorations’ at University College London’s Dept. of Linguistics
Introduction
Chomskyan linguistics has had a great deal
of influence - or so people say. I happened to tape one of his technical
lectures on linguistics, delivered in University College, London, and herewith
present a transcription of it for the edification of my numerous fans. (Most
of the questions, which weren’t audible, are omitted, and many names and
technical terms are queried). The reader must imagine quite a large lecture
theatre with steeply-tiered seats, a green blackboard, and an air of
excitement.
The chief impression I received was of Noam Chomsky
as a sort of generator or powerhouse, setting the agenda in the way he describes
the ‘New York Times’ and other rags as setting political frameworks. Three
reserved rows, at the front, for department members, seated those who hung
onto every word and who would presumably regurgitate the material and draw
salaries for doing so. Of course, if Chomsky actually found a true theory
of language, they’d be out of a job.
Another striking impression as far as I was concerned
was that free communication within a subject, about which of course there’s
much mythology, is important where it exists so the people know what to talk
about, to avoid the sort of thing that happens when, in the everyday world,
newspapers don’t get distributed and people don’t know what they’re supposed
to think about anything.
I spoke to several people after this talk, for
example an American who was supposedly studying the best ways to teach languages.
(‘Applied linguistics’ sounds better than ‘language teaching’.) I asked whether
e.g. with Latin they couldn’t try to transpose or convert Latin constructions
into English, to get the feel of Latin. This seems to be more or less heretical.
The talk was about seventy minutes in total;
there was a subsequent small time allotted for questions. The copyright in
the talk is Noam Chomsky’s; the publication he talks of must have taken place,
and in any case it seems he disagrees with much of the content, so I can’t
imagine anyone would object to this Internet piece. I’m uncertain about the
copyright status of this transcription, but if there are any rights I suppose
I might as well claim them - Rae West
- NOAM CHOMSKY: “Well, Neil had suggested that I talk more
or less about the errors that I found in the last couple of weeks the draft..
in the last chapter of an unpublished manuscript, and though I’m very impressed
by the British educational system it’s hard to believe... [Laughter]
I’ll try to talk about some recent work..
I’ll try to give the flavour.. I’ll try to get through the errors in section
ten. Let me begin with a couple of background assumptions and observations
that help I hope put things in place. They will be familiar to those in the
field.. by no means controversial.. and they become more and more controversial
as they become more specific. In fact it wouldn’t be surprising if they turn
out to be wrong.. as has happened plenty of times in the past.
The basic assumption is that there is a language
faculty, some special aspect of the mind/ brain which is dedicated to the
use of language. The language faculty consists of a cognitive system which
stores information, and various performance systems which access the information.
The - are we in the same ballpark here? I hear voices [laughter] -
The cognitive system characterises an infinite class of expressions; expressions
are sometimes called structural descriptions, each expression contains -
[Interruption by an official about people
standing, fire regulations.. much shifting of people into seats...]
- each expression contains the information about a particular linguistic
object, the relation between the language and the set of expressions is what
is technically called strong generation. There’s a notion of weak generation..
it doesn’t seem to have anything to do with natural language, it’s caused
an awful lot of confusion. The information in an expression has to be made
available to the performance system somehow. And one standard assumption
which I’ll keep to is that it’s made available in the form of what are
technically called linguistic levels.. there are other ways in which it could
happen; you can imagine dynamic systems of various kinds. But I’ll assume
that information is available at linguistic levels, at interface levels relating,
providing information from the cognitive system to one or other performance
system.
I’ll further assume as a standard but may well
turn out wrong that there are just two interface levels, one of them connected
to the sensori-motor systems and one of them connected to all the other systems
of language-use, the representations of the objects at these levels are called
pf and lf, phonetic form and logical form, representations, so that means
an expression is at least a pair of representations, one at the pf level,
one at the lf level, maybe at most that, but it’s at least that. That’s a
way of restating in complicated modern terminology that language is a sound
with a meaning, in traditional terminology.
Now the language, the system that characterises
the infinite set of expressions, I will assume that it’s a recursive procedure,
not some other way of characterising infinite sets, and there are such infinite
other ways, and I’ll call the generation, the system that forms, that generates
the lf representation that system I’ll call syntax in one of the many terms
of the word.
The er, well, skip some, for the early part of
the subject, this part of the subject more or less revives or gets it present
form about 45 years ago, and the early problem the early research for several
decades was driven by the tensions, the still unresolved tensions, between
two different goals, which lead you in different and opposite directions.
One goal is to provide the information about languages with factual accuracy,
that is to construct language recursive systems, grammars, in one of the
senses of that much-abused term. So to provide grammars that give the facts
accurately about Swahili, and about English, and Hungarian, and so on.
To the extent that a system does that accurately,
it’s called descriptively adequate. The other problem is a more interesting
one, and that’s the problem of finding out how anybody knows the descriptively
adequate grammar, how anyone knows how you and I know or every child knows,
and that’s the problem that’s called explanatory adequacy. A theory of language
is said to meet the condition of explanatory adequacy to the extend that
it is able to show how, from the data that might be available to a child,
you can get a descriptively adequate grammar.
So that turns out, when you think, about it to
be a theory of the initial state of language faculty. A theory of the genetically
programmed initial stage of the language faculty and a theory of that faculty
is said to meet the condition of explanatory adequacy to the extent that
it does this. Well, explanatory accuracy is a much more interesting topic
and a much harder topic. But the search for descriptive and explanatory adequacy
sends you in opposite directions. As you try to get more and more descriptively
adequate grammars, they get more and more complicated and intricate and specific
to particular languages and particular structures and particular languages
and so on and so forth, but you KNOW that’s the wrong answer, because the
right answer must be that there’s only one language, they’re all identical,
otherwise it would be impossible for anybody to know any of, them cos you
just know too much given the data that’s around so you must have known it
to start with, so it must be that all of this proliferating complexity is
just misleading epiphenomena and if you could only see the truth - you know
- you could see that it’s just all the same system with minor
modifications.
So the search for explanatory adequacy is leading
you towards saying all this stuff doesn’t exist, and the search for descriptive
adequacy is leading you to say, look, it’s way more complicated than you
thought. And for a long time the main research programmes were directed to
trying to resolve this obvious conflict or tension - not contradiction, just
tension. And the way it was done - you can imagine many ways - but the way
that turned out to be fruitful, was to try to find properties of rule systems
which you could abstract away from particular languages and just attribute
them to the initial state of the language faculty, and show that when you
abstracted these principles and properties away, you got systems that were
less complicated than they looked, so all the complicated varying details
turned out to be special cases of the interaction of some principles, and
so on. Well, that went on from about 1960 up until say 1980, that was the
main course of research. Around 1980 a lot of this stuff fell together all
of a sudden, and it sort of crystallised into another way of looking at things
which had been building up all through these years, into a system that since
then people have been calling systems and parameters framework. It’s not
theory, it’s a framework. It becomes a theory when you fill out the details.
But the principles and parameters framework, which is a very sharp departure
from traditional grammar, this is a field that goes back 25 hundred years,
in fact a lot of the work that has been done is not all that dramatically
different from what Panini was doing 25 hundred years ago. But the 25 hundred
year tradition had some common threads through it. One common thread is this,
what you know if you studied Spanish or something. When you study a language
you have a chapter in the book which is on how to form relative clauses in
Spanish, and how to form verb phrases in German, and so forth. And they’re
very specific - complicated detailed rules, although they don’t begin to
cover the data.
I mean as soon as people start this they immediately
found that traditional grammars didn’t even begin to be descriptively adequate;
they just ignored everything. Which makes sense - because people know it
anyway. [Laughter] You just confuse people if you try to spell it
out even if you knew it. But the properties of a grammar, the rules that
you find, are specific to particular languages, and even to particular
constructions in particular languages. And modern generative grammar took
that over.
So if you look at early generative grammars
you’ll have rules for forming the passive in English, and other rules for
forming the relative clause in Italian, and so on and so forth. The principles
and parameters approach says there aren’t any rules and they’re aren’t any
constructions, so it’s a very radical break. All that there are, are universal
principles, which are part of the initial state of the language faculty,
and then it has possibilities of variations, small possibilities of variation,
called parameters. So there is something language universal, namely the
principles and the possible parameters. There’s something language specific,
namely the choices of values for the parameters, and that’s all. Things like
the passive in English or the relative clause in Italian are, from this point
of view, taxonomic artefacts. Kind of on a par with you know, ‘large
mammal’ or ‘household pet’ or something, they’re real, but they have no
scientific, they don’t exist in the universe, they’re just kind of the
interaction of a lot of things to do with this. So that’s the approach, and
it’s a big, change, and it changed everything.
One change is that we now have a way of saying
what should have been obvious all along, namely that a state of the language
faculty is inevitably be going to be completely different from anything that
you might reasonably call a language. A language is going to be defined as
a particular choice of values of parameters. So there’s n parameters, maybe
they have two values, pick the choice for each one, that’s a language. The
state of the language faculty is never going to be known like that, it’s
always going to be the result of crazy and uninteresting experience, and
in fact even uninteresting history of the languages, and so on and so forth.
So we can now distinguish a language, now let’s call it an i-language just
to make it clear, that’s a very technical notion, i for internal, individual,
intentional in the sense of an intentional characterisation of the generative
function. So an i-language is a set of choices of parameters, and it’s distinct
from a stage of the language faculty. That’s something else, and something
not especially interesting. Furthermore, a goal to try to show that there’s
a unique i-language, that is that there’s just one and only one i-language,
at least within syntax, within the part of language that’s forming logical
form representations, the interface with the systems of language use. This
approach puts the question of explanatory adequacy on the research agenda,
it doesn’t solve it, but it makes it a formulable question for the first
time. Up until this time, you really couldn’t formulate it, so that the most
that anybody could dream of was what was called an evaluation procedure that
would choose between alternative proposals as to what might be the theory
of a language. But there was no way to talk about how you might gain one
or the other from data.
But if this approach turns out to work, there
is an answer, namely the parameters have to be designed so that the values
for them can be determined on the basis of extremely little data. And that,
if you can do it, would solve the problem of explanatory adequacy, it would
say that the child gets a little bit of data and says OK I’m this type of
language, and I’m that type of language, and once you’ve answered all those
questions everything works, because the principle’s already in there, so
have a language. So it’s possible to pose the problem of explanatory adequacy;
it gets on to the research agenda. This immediately led to completely new
ways of looking at questions of acquisition, and typology, and sentence
processing, and all sorts of other things, and it also raised new internal
questions.
The main internal question of course is to find
the principles and define the parameters, and the search for that led to
quite a real explosion of empirical work in the last ten or fifteen years.
I’m sure much more has been learned about language in the last ten or fifteen
years that in the whole preceding 25 hundred, with new questions that nobody
ever thought of, and lots of new answers, and new theoretical ideas, and
also typologically quite diverse; by now, there’s work of this kind going
on in a very wide range of typologically different languages which are being
looked at in new ways and so on, and also - and here’s the topic I want to
talk about, finally! - and also, since there is a conception at least of
what an authentic theory might look like for the first time ever, then -
namely one in which the question of explanatory adequacy can at least be
raised in a serious way - not answered, but raised - given that, you can
start asking some harder, more interesting, and more principled questions
about the nature of the system. Now there are several of those. Generally
speaking, the question is, when you look closely, how much of what you’re
attributing to the language faculty is really driven by empirical data.
You can now ask instead of trying to get a patchwork
system where you get things to work, look closely and ask how much I am
postulating is really necessary, given the empirical data, and how much is
there to sort of solve engineering problems? Looking at the same question
from another point of view, you’re asking, the language faculty and the
i-language that instantiate it, how good a solution are they to a set of
general boundary conditions that are imposed by the external systems, the
language faculties embedded in other systems, and they put some constraints
on what the language must be. Like, it’s gotta be, have a, linear temporal
order of speech, that’s an external condition. And the interpretive systems
have to find phrases and what are called ?fader relations, semantic relations
among the phrases; these are external conditions imposed by the systems on
the outside, the interpretive systems, and there are such conditions, and
given those general boundary conditions, how perfect a solution is language
to satisfying those conditions?
It’s kind of picturesque, but how
‘perfect’ is language, given the output conditions and I’ll call them bare
output conditions, to distinguish them from other kinds, output conditions
that are often used, that are really parts of the computational system, like
filters, and ranking of output constraints, and so on. That’s part of the
internal computation at the output, so I want to talk about bare output
conditions, those that come from the external systems. And in principle you
could learn about those independently, like you could study the articulatory
system, or if you knew enough, you could study systems of language use, and
you could say well, what are they requiring the language faculty to give
you at the interface, and you can ask how perfectly language satisfies the
condition that those things impose. Well, that brings us to what I’ve bin
calling the minimalist programme...which is an effort to really explore the
intuition that language is surprisingly perfect. In a sense that naturally
we wanna make precise. And from exploring this intuition, we want to make
sure that we are accepting any structure at all only if we can really show
that it’s motivated by empirical data. And that’s turned out to be, it’s
an interesting programme, I don’t know if it’s right or not, but it’ll,
it’s leading into interesting directions. Optimally, you can see what you
oughta try to find if language is really perfect. The students, like ?Reeder,
will remember that ever since the principles and parameters approach got
formulated, every class of mine in the Fall I always started by saying
let’s see how perfect language is and we tried to make it perfect and it
always turned out to be hopelessly imperfect, as the thing got on. But somehow
in the last couple of years it’s again started to fall together and maybe
it really is perfect, which would, if true, be extremely interesting because
it makes it totally unlike anything in the biological world, as far as we
know. I’ll come back to that. Well, optimally, it should be the case that
there aren’t any other levels, just the interface levels. Nothing else. Which
means no ?d-structure, no deep structure, surface structure, s-structure,
none of that stuff. There should be no structural relations, other than those
that are forced by the interface conditions. That means no government, no
binding theory, internal to the language faculty. That means all the traditional
notions, including the ones taken over by generative grammar, have to go.
They wouldn’t fit. And other properties of that kind as we go along. Well,
OK. Let’s go along a bit. The i-language, that is this,- which now we identify
just as a set of parameter choices, the i-language has two components, one
is a lexicon, which I will take in the traditional sense to be the repository
of exceptions, so the things that aren’t principal, like the fact that the
sound ‘tree’ goes with the concept ‘tree’ rather than with some other thing.
That’s in the lexicon. And then there’s a computational procedure, and I
wanna try to show that that’s unique and invariant at least in the syntax,
there’s only one of them. Martians looking at humans would say there’s one
language with a bunch of lexical exceptions. The computational system takes
some kind of collection, it’s called an array of lexical items, pick em out
out somehow, and it carries out a computation in a uniform fashion, and it
ends up forming interface representations. So, now, assuming a pair of interface
representations pf and lf, what’s the array? Well, it has to have at least
some structure, and without going into it, I’ll assume it at least has the
structure of what I’ve called an enumeration elsewhere and probably much
more. It won’t matter for this, I won’t go into it. But the array has some
kind of structure; how much we really don’t know, we’re guessing. The er,
we’ll say that the derivation converges at an interface level if it’s
interpretable at that level. And that’s a property that’s determined by the
external system.
Otherwise it crashes at that level. A derivation
converges if it converges at both levels. Separately. That assumes that
there’s no interaction between the pf and the lf level, which again is a
very strong empirical claim, and there’s a lot of evidence against it. So
if it’s true, it’s interesting. But if the language is really perfect then
you’d expect a few things to happen independently, so I’ll assume that er
making perfect assumptions.
So we assume, and then we say, that the derivation
crashes if it crashes at either level. Er Well, in looking at the lexical
items, each lexical items is some complex of properties, properties with
both features so it’s ? that, so lexical items have a complex of features,
and given this much structure you can distinguish three types of features.
There are those that are interpreted at the phonetic level, the pf level
so-called, category p, they’re accessed at the phonetic interface, you know
like aspirated p and that sort of thing. Er there’s those that are accessed
at the lf level called semantic, though it’s misleading, so the semantic
features are accessed at the lf level. And then there are others, which are
just accessed by the computation itself. Call them formal features. So we
have three kinds of features, phonetic, semantic, formal, defined this way.
Now these sets can overlap in all kind of ways. A further assumption is that
the phonetic ones are disjoint from the union of the other two sets. So the
phonetic features are not found, they’re a separate set, they’re not in the
other two sets, semantic and formal. What about the semantic-formal relation?
Well that actually is a traditional question. That’s sort of the question
you know whether verbs refer to actions, and nouns are names. And so on and
so forth. Notions like noun and verb are formal, they’re accessed by the
computation, but notions like action, and thing and so on are not, they’re
semantic or whatever the right ones are ?, and the question of how the formal
and semantic interact is a version in this system of the old traditional
questions of the semantics of grammatical categories and so on. What about
the parameters? Where are they? Well, a nice system would say that they’re
only among the formal features, that is the phonetic features and semantic
features aren’t parameterised. So let’s try that. There are just only the
formal features are - the the narrower and more restrictive you can show
the parameterised features to be, the easier it is to deal with the problem
of explanatory adequacy.
That’s a tough problem, how everyone knows this
stuff, with no evidence. So you wanna make sure the right theory oughta have
the answer, it’s a very small number of things to learn and a very circumscribed
place. Well, one kind of circumscribed place is just the formal features.
An even more circumscribed proposal is that the only thing that’s parameterised
is the formal features of what are called functional categories. Functional
categories are the ones that lack any non-trivial semantics. So not verbs,
adjectives and nouns, they’re not functional categories. They have non-trivial
semantics. But the others - and you have to explain what you mean by this
- but the others with trivial semantics or no semantics are the functional
categories. Er, and they have formal features like others, so maybe the
parameters are only in the formal features of functional categories. And
a still narrower theory, and one which begins to look more reasonable as
we go along, is that the only parameters in the syntax at least have to do
with one particular property of formal features of functional categories,
and that is whether they come out the mouth. Er so are they pronounced or
do you just compute them in your head, you know? And languages seem to differ
- the sort of intuitive picture is look, there’s one computation going on.
Er no matter what language you’re speaking you’re carrying out the same mental
computation. But languages differ in how the sensori-motor system accesses
it. So some access it at one part, an another accesses it in a different
part, and that makes the language LOOK very different, but again, from the
Martian point of view they’re essentially identical with a trivial difference
and also from a child’s point of view, and that’s the important part.
The languages MUST look identical from the
child’s point of view. Otherwise it’s impossible to learn any - that’s the
driving empirical fact that’s hanging over your head like a cloud all the
time and it makes the subject interesting. So a possibility to be explored
would be that the only property that’s parameterised is what’s technically
called strength. Are you a formal functional feature which is pronounced,
or are you one that’s just computed and unpronounced? OK. So like the idea
would be that say in Latin the cases actually get pronounced, but in English
you got the same cases but they just aren’t pronounced. So you’re just grinding
away in your head. You see the EFFECTS of them, you get the consequences,
but you don’t hear them. And similarly with other things in other languages.
So the, if you could show this, that would mean that the typological variety
of languages reduce pretty much maybe entirely to just the question of the
various combinations in the way the unique invariant computational system
is accessed.
Well, this is a little too strong, and let’s
look at how too strong it is. Remember the driving empirical question is
acquisition - how can anyone acquire a language? And the fact is, that in
parts of language that are close to the data, that are close to the phenomena,
you’d expect variation to be possible. So, you know, different kinds of phonetic
variation, you can hear them, so you can get language variation in them.
On the semantic side, you might imagine, and maybe it’s true, that things
likes semantic fields in the traditional sense are just variable within some
range, because a little bit of data might tell you something about how a
set of concepts is broken up one way or another in what are traditionally
called semantic fields. So you’d expect some variety round the periphery.
Phonetics, peripheral semantics, and so on. I’m gonna abstract away from
that and just talk about the rest, and when I say that all typological variety
is in the, I propose, is in the strength of formal features, the functional
categories, I’m abstracting from that stuff. Well, can we make it even
narrower?
It looks possible, so let’s ask what formal features
can have this property of strength. Well, er it’s only functional categories
I’m assuming that it’s only functional features that are parameterised are
a feature that says in effect I need a certain category. Like I need a noun
phrase, or I need a verb. But not other features, like I need case, or I
need number, or something like that. So the strength, possibly, is reducible
to the need category property of formal features of functional categories.
That would mean to say specifically that t, tense, which is a functional
category, may or may not have the feature I need a dp, or noun phrase, basically.
And if it has, that’s the feature that’s got to be standard projection principle.
If you have it, standard projection principle, you necessarily have it, if
you only have a subject, if you don’t have it you don’t. That’s the ?t feature
of tense. But you can’t have a case feature. And t might or might not have
the property, I need a verb. If it does, you have what are called d-raising
languages, if doesn’t, you don’t have d-raising languages. Maybe that’s the
only - but no properties of say verbs or nouns, they can’t have strength
features, and no other access properties, other than category access. Well
that’s then a very narrow class of possible variation of language, and if
the system works out, that’s the way it’ll be, languages are all the same,
except in one small corner of what comes out the mouth.
From this point of view, the relation to the
sensori-motor system is sort of extraneous to language. It’s like a nuisance
on the outside, imposed by external systems, so, like, if we could think
and communicate by telepathy, let’s say, then you just dump all this stuff,
and you just carry out the one unique computational process, that’s sort
of the idea. And it would be nice to show that the imperfections of language,
of the kinds you might work it out if you were sitting somewhere and you
were god or something like that, that those imperfections - not that we want
to get too exalted a self-image round here!- [Laughter] but the, try
to show that the imperfections, as much as you can, really of result from
the sort of extraneous fact that because of the you know ridiculous lack
of telepathy we’re forced to turn all of this stuff into a sensori-motor
output which has certain properties, cos that’s the way the mouth works,
and so on and so forth. That’s a further goal. Well, among the formal features
- we’re now concentrating down on those - some of them are what we might
call purely formal; that means they’re formal but not semantic. Remember
these two categories overlap. So take those that are purely formal. They’re
not semantic at all.
That would be things like - that means they get
no interpretation at the interface level. Well, an example would be say case
for nouns. The case of a noun doesn’t affect its interpretation; it’s interpreted
the same way if it’s nominative or accusative let’s say. Verbs also have
a kind of a case property, signing property, like some verbs assign case
and some don’t. Well, whatever that property is, it’s not interpreted. The
semantic correlate to it might be, like transitivity might be, but not the
case-assigning property itself. Er on the other hand, what are called the
phi-features, the features like number, gender, and person, - number and
person at least and sometimes gender depending on the language - those features
get interpretations at the interface, like interpret a plural noun differently
from a singular noun. So the phi-features of nouns, they get interpreted.
On the other hand, the same phi-features in verbs and adjectives don’t get
interpreted. So a verb is interpreted the same way whether it’s singular
or plural. All right.
So the phi-features of nouns are interpretable,
the phi-features of verbs and adjectives aren’t interpretable, nobody’s case
features are interpretable. That turns out to be quite a crucial distinction,
it’s a principal distinction determined by output conditions, and it has
effects if you think about it. This is something that wasn’t really thought
about till quite recently. That’s part of the problem that’s bin making things
look imperfect for the past ten years or so, that we haven’t noticed that
distinction. When we notice it a lot of things fall out. But it’s a clear
distinction and a highly principled one.
Interpretable features cannot be erased in the
course of a computation. Because they’ve gotta be interpreted at the output.
On the other hand, uninterpretable features MUST be erased in the course
of a computation, because they have no interpretation at the output so if
they survive the output it crashes. Well that tells you right away a lot
about the structure of a computational system. It says whatever it’s doing
you can’t get rid of any interpretable features like say plural and nouns,
and it MUST get rid of case features, like nominative and nouns. OK. And
it must get rid of plural in verbs, cos that’s not interpretable. And in
fact you can now sort of glimpse what a perfect system would be. It would
say that the only operations there are, are the ones that get rid of pure
formal features that are uninterpretable. There aren’t any other operations.
So the computational system will be restricted to operations which get rid
of uninterpretable formal features, and the only well-formed derivation,
you know the only computation that gives you a linguistic object, is one
that adhered to the principle that it didn’t do anything except some operation
that got rid of uninterpretable formal features. Notice that there’s a difference
between what’s called structural and inherent case in this respect: inherent
case is semantically-related case, case that’s assigned by virtue of a semantic
relation, like the genitive case comes out in English with an of phrase,
assigned to an adjective, you know you say ‘proud of John’ the relation between
’proud’ and ‘John’ is a semantic relation. And that’s inherent case. And
that’s distinct from structural case, which is purely configuration. Like
nominative case assigned to whatever’s in the subject position - it may have
no semantic relation to anything - accusative and nominative cases are typically
structural, other oblique cases are typically purely inherent and they have
all sorts of different properties. the inherent cases are interpretable,
cos they reflect the semantic relation; the structural ones are not, so it
oughta turn out that things with inherent case are invisible to the computational
system, cos there’s nothing that they have, their phi-features are interpretable
and their case is interpretable. We really shouldn’t call it ‘case’; it’s
just called case cos it’s kind of similar morphology. But it’s functioning
in a completely different fashion, functioning as a reflection of the semantic
relation, the other isn’t.
And in fact it should follow then that the
computational system is only looking at things like, it only can see, things
like structural case, phi-features of nouns and adjectives, strength of features
which is not interpretable, and things like that. Those of you in the field
will recognise that this is the core idea of John ?Vernieux’s case theory
which set off some of this work years ago. Well it should follow then that
all movement operations, you know transformations, all movement operations.
should be related to, they should apply just in the case that they are
contributing to the erasure of pure formal features. Checking erasures.
It’s more complicated than this, but something like that. And it also should
turn out that parametric variation should have to do only with the strength
of formal features, cos the others you’ve got to get rid of anyway, and the
only variation should be which ones you get rid of in such a way that it
effects the phonetic alphabet. OK. And which ones you just get rid of in
your head and it doesn’t effect the phonetic alphabet. Well, er let’s suppose
that that’s true. If that’s true, it’s a nice elegant system and it’s kind
of perfect. What does it mean to say that a formal feature is
‘strong’? Incidentally heres where I’m starting to correct stuff in the in-press
version of the final paper I’ve been talking about, ‘cos it’s got it wrong.
But if you think about it, the notion of a feature being strong or weak is
sort of bin mysterious, how can a feature have a further property, how can
it have another feature, like I’m strong or weak? Well, if you think about
it, it doesn’t have another property. To say that a feature is strong is
just to say that it’s there. If it’s there it’s strong. If it’s not there,
it’s not strong. So the d-feature of tense, the feature that underlines the
extended projection principle, you know that says some languages have subject
verb objet, and others verb subject object and so on; that property is just
the strong feature of tense which is now that feature I need to ?dp. That
feature is either there or not there. if it is there we’ll call it strong.
If it isn’t there, so, we don’t call it anything, it isn’t there. So
there’s no feature of strength over and above the other features, it’s just
a way of referring to the features that are there in this sub-category of
parametric variation which reflects what happens to come out the mouth. If
tense has a d-feature, you have a SVO language or an SOV-language, if tense
lacks a d-feature you have a VSO language, if tense has a d-feature you have
a verb raising language, French type, if it lacks a d-feature you have non
verb-raising language, you know English, Scandinavian type, but that just
means the verb remains in situ, it doesn’t have to go anywhere, unless it
does for some other reason, like if the verb has to get all the way up to
complementizer ? to raise, but that’ll be for other reasons. It also follows
that in languages like English the tense-verb relationship should be actually
of a kind that was proposed in the 1950s, of a kind that came to be called
?affect stop, with just some boring phonetic property, that irrelevant
sensori-motor component, which is relating the feature to the verb because
features can’t just hang around freely and still be pronounced by the mouth.
They gotta be attached to something. So it’s kinda ?lowering, ?Plasmic and
others have been pursuing that framework. So the term strength is just in
the mind, not to be taken seriously as in this coming-out chapter, and it
just means the need category feature of a functional category is there. Period.
Now the, this distinction between interpretable and not, happens to have
a big range of empirical consequences there’s a lot about that in the stuff
that’s coming out, so I won’t talk about it. And in a perfect theory, in
an ideal theory, movement should be restricted then to a very narrow convergence
conditions related to uninterpretability of pure formal features, it should
be down to that, and that’s what a kid looks at when it’s learning language,
the kids you think are so dumb, they’re looking for the uninterpretable strength
features of functional categories; that’s what they’re looking at, according
to this story. [Laughter] Now the general picture then is this initial
ray, with whatever structure it has, is going to, is generating, you know
deriving, this lf representation and I’m only looking at that side, by the
operations that are driven, forced in fact, by the bare output conditions.
Well, one of these operations is the one that’s called spell-out, the pf
and the lf representations are distinct, in fact probably disjoint in their
properties, not just distinct but actually disjoint, so gotta be somewhere
along the derivation it’s got to split on two paths, one gives you the
sensori-motor side, and the other just goes merrily on its way with the syntax.
Now, general assumption is, the simplest assumption - we’ll keep to it unless
we’re forced otherwise - is that any operation can apply anywhere. OK. So
the operation spell out can apply anywhere, and what it does is remove the
p features, it takes them away and everything else keeps going, and the p
features and whatever else it takes away, they just go off into the what’s
called the phonological component and meanwhile the array to lf derivation
just keeps going, now deprived of its p features, but otherwise going without
change. If something in the numeration, in the initial array doesn’t get
used, by the end, well it just isn’t a derivation. It’s like a proof that
?miss is missing a step, or something. It isn’t anything, so we throw it
out. If it happens to end up with something which includes an uninterpretable
feature, it crashes, so it still isn’t a real derivation. Well, the - let’s
proceed. The further principle you might wanna have and might wanna see in
a perfect theory, I’m now talking about the principal part, the array to
lf syntax - a perfect theory oughta have the property which I might call
uniformity which is that no operation is restricted to one or other part
of the computation. Now there are basically two parts - there’s the part
before spell-out, and the part after spell-out. Let’s call them overt and
covert for the obvious reason. So there shouldn’t be any principle saying
that some operation can only apply say in the overt part, or only in the
covert part. And if you meet that condition, let’s call that uniformity,
another condition you might wanna meet is what you might call exclusiveness,
and that would say that nothing enters into the computation beyond the initial
lexical features from which it began. So that would mean that the whole
computation down to lf is just a rearrangement of lexical features. In that
case we’ll say the condition of inclusiveness, the empirical meaning of that
is you can’t have any bar levels, or indices, or any of that kind of stuff
that all that has to go, because none of that is in the initial lexical,
it’s not in the lexical entry. That throws out an awful lot of technology
so it means everything based on that technology’s gotta be wrong and the
problem is to show it. So a really perfect theory would meet these two
conditions. Let’s assume it does.
Furthermore the derivations have to meet a kind
of economy condition, an optimality condition, which says that a derivation
gets interpreted, a computation gets interpreted, only if it is the most
economical convergent derivation, and to define that properly turns out to
be quite intricate and important [TAPE PROBLEM here... bit of a gap..]
at once introduces rather serious questions of computational complexity.
The reason is, you’re comparing derivations, to find out, to decide, whether
you’re hearing something, you want to decide whether it’s interpretable,
you have to be able to compare derivations. Any of you who know anything
about automata theory and that kinda stuff will know this can lead to WILD
computational complexity problems. And it would be nice, in fact necessary,
to cut - to show that they don’t arise. Or rather, more precisely, to show
they just arise in the class of cases which are unintelligible. Now we know
that an awful lot of well-formed language is totally unintelligible. So you
only use scattered parts of the language, because the rest is just not
intelligible. .. short, and simple, and well-formed and so on, but now
there’s a problem, and a very interesting problem, which is just kind of
lurking around the horizon, you can formulate it, you can’t really solve
it, and that would be to try to show that those scattered parts of language
which are usable are in fact those parts in which problems of computational
complexity don’t arise, OK. That’s a really hard problem, and it’s interesting,
problem of the nature of the hard physical sciences, tough problem, therefore
an interesting one; I don’t know how you could answer it but you could think
how to approach it.
So that problem’s kind of on the horizon somewhere.
So how do you approach it? First by looking closely at the economy conditions
and trying to see to what extent you can show that they DON’T introduce
computational complexity problems. And there’s a lot of natural ways to go.
So for example to cut down, cutting down computational complexity, means
shaving away the number of things you have to look at when you decide whether
some derivation is correct. So make sure you don’t kind of get exponential
blow-up at each point. Well, one step towards it is to suppose that at every
step of the derivation you don’t ask about all the derivations that are possible,
you just ask about the most economical next step. So what’s the most economical
step that can be taken NOW that’ll lead to a convergent derivation? So that
entails that as you’re moving through the computation the class of things
you have to look at is narrowing all the way along, you know, and gradually
gets quite small. Still a big class, but it narrows. Another - let’s assume
that that’s the way economy conditions work - actually all of this stuff
has empirical consequences, every such proposal has a lot of empirical
consequences, so you have to check ‘em out.
[Tape Turned Over About Here]
Another question has to do with what everybody
assumes to exist somehow, locality conditions of various kinds, in fact ?
a book called Locality Conditions. But the er so the big problem is find
the locality conditions. Well, one kind of locality condition is to say that
movement should take place, be as short as possible, minimal chain link condition
it’s sometimes called, minimum link condition. Now, what kind of a condition
is this? Well, in the best possible world, this would be an inviolable -
this is a very hard thing to figure out. I mean computationally it’s extremely
hard to know what’s the shortest movement. For one thing, you have to compare
all sorts of operations. For another, it introduces conceptual problems which
are kind of unformulable, like how do you compare shortening a derivation
in one part with lengthening it in another, you know how do you, which is
the shortest? There’s no meaning to that. Well, the best way through this
whole mess would be to say the question can’t arise, that there only are
shortest possible movements, so it’s an inherent property of rules that they
MUST be the shortest possible, and if you violate it you’re just not doing
anything, it’s like you know playing chess and making an inappropriate move
or something, or trying to prove a theorem and doing something that isn’t
a ?rule of ?entrance. There’s no question of as to whether it’s good or bad
or short or long, it just doesn’t exist. So the only derivations are those
that satisfy the link condition. That would be a nice property, and the empirical
consequences turn out to be pretty reasonable I think; again, there’s a lot
about this in this stuff that’s coming out. Well, let’s assume that’s right.
Another proposal would be to show that operations take place only if
they’re forced by unchecked, so far unerased, pure formal features. Only
in that case can the operation take place. That’s cut down, things of that
kind cut down the class of computations that have to be inspected, quite
radically, still leaves it you know, too big, but at least these are the
kinds of steps that can be taken first, towards eliminating computational
complexity. The problem at each point is to show that the empirical consequences
of such a proposal, which are usually very extensive, that they’re right.
If they’re wrong, too bad. But if they’re right, you have an idea, you think
you’re on the right track, you’re on the way to cut down computational
complexity. Well, this stuff is discussed in this mystical fourth chapter!
So, what are operations? Well, the bare output
conditions suffice to tell you that there are at least two, three in fact.
One of these operations has to be spell-out, which I’ve already mentioned
and that’s because there are at least two interface conditions which are
separate. And the second operation that’s forced is, let’s call it merge;
take two things you’ve formed already and make a third thing. That amounts
to saying er a sentence isn’t just a set of lexical items, it’s some kind
of structure formed from them, which it obviously is. So, bare output conditions
force you to say that when you have constructed linguistic objects, you’ve
constructed a bigger one from tow of them and we call that merge, and you
try to make them as simple asp possible, and notice that merge doesn’t carry
any cost, it’s free. And the reason is if you have an array of items and
if you don’t apply the operation merge often enough you’re gonna end with
some of the items unused. And therefore it crashes. And so merge is free,
it doesn’t have any cost. When you’re counting the economy, you count the
number of times you’ve done merge; it comes for nothing. Er the last operation
that seems to be forced, and this just looks like a property of natural language,
quite different from invented symbolic systems at this point, is the operation
call it move, and that expresses an irreducible fact about natural language,
which is captured in one or another way in every theory, people don’t like
to say it, and that is that things are interpreted in positions that are
displaced from, er they APPEAR in positions that are displaced from when
they are interpreted. And that’s just a fact, you know. Look around language,
you take the pieces of an expression and you see they are interpreted somewhere
else. That’s an irreducible fact about natural language, the simplest expression
of that fact is to say there are objects, called chains, which simply express
the relationship between the position and the point of interpretation, and
transformational grammar’s one way of working that out. Other notations sometimes
claim to be different, but if you tease them out they’re the same, because
there’s no getting around this irreducible fact. So we need an operation
that relates those positions - call the operation ‘move’ for reasons to do
with its nature.
Well, at this point we can return to the question
of strength. And I’m going to make some comments. I’m going beyond the
unpublished paper. I’m, you’re always gonna try to merge at the root. When
you’re building things up, to embed things by merger you can show is a much
more complex operation than just to tack it on to what you’ve already formed.
Good technical reasons for this which you’ll know if you look into the system.
So therefore you will always merge at the root if possible, and in a perfect
system you’ll only merge at the root, because you’re trying to make everything
perfect, you know. So let’s assume that merger is always at the root. It
follows that strong features can only be introduced at the root; they can
never be embedded, OK. So strong features will only appear right at the top
you know, if you think about it graphically, at the top of the tree. Or the
bottom of the tree, depending on how you look at it. But that’s all metaphor,
because there aren’t any trees, we’ve given up bar levels, all that stuff’s
gone, these are just graphic notations.
The merger will always be at the root, strength
will always be introduced at the root. Furthermore there’s another economy
principle which will be pretty interesting in its consequences and that is
in the initial array - remember, there’s only a certain class of things you
have choice about, namely the parameterisable strength features, OK, so now
we’re just restricting ourselves to that - and the principle says that one
of those things can be in the initial array only if it has an effect. Only
if it has an effect, either at pf or lf. If it has no effect either at pf
or lf, it can’t be there. OK. That economy principle which turns out to have
quite interesting consequences when you pursue it, let’s assume it’s true,
er that means one of these optional things can only be there if it’s gonna
show up somewhere at the output. It follows from that that er the strength
features - well; you can now formulate the following proposal. It doesn’t
yet follow. The formal proposal would be, feels like theorem hanging around
somewhere, that er strength features can only be introduced overtly. Reason?
If they’re introduced covertly they’re obviously not gonna have a pf effect,
and they don’t have an lf effect. OK. Now, if you can show that, they plainly
don’t have a pf effect because they’ve been introduced after the split, so
they will only be able to be there at the beginning if they have an lf effect.
Well, it seems to turn out that the only one that has an lf effect is what
drives qr.
There’s an interesting paper by Danny Fox on
this, and other work by ?Tania ?Reinhart and others, which has been put in
a different framework, but what it comes down to saying is, you can carry
out quantifier raising, you know, putting a quantifier somewhere where you
wouldn’t expect its scope to be, only if that operation gives you an
interpretation you otherwise wouldn’t have. Well - and this is always a covert
operation. Well, from this point of view it means that er you could only
have the strength feature that says ‘move the quantifier to me’ if it has
an lf effect. Incidentally, the consequence is that in languages that have
overt counterparts to quantifier raising, this shouldn’t happen, you should
be able to do things freely; that’s Hungarian apparently according to ? at
least, who says the qr effects and so on you get in ellipsis you don’t get
in Hungarian (she claims). Tell me if it’s right! And if it’s right it would
be kind of nice, because that would mean since that is having a pf effect,
since you’re overtly moving it, doesn’t matter, you can do it even when you
don’t have an lf effect, because it’s showing up somewhere. On the other
hand in English, when you look at ellipsis constructions and so on, you can
see you don’t get these effects.
Well, you know, if it turns out it’s nice. Well,
there’s a potential theorem hanging around, which says, strength must be
erased overtly, because otherwise it couldn’t be there at all, except in
the cases where you have things like quantifier raising, where it does have
an lf effect. But verb raising doesn’t have an lf effect. So that’s gotta
be covert. Well. What about xp raising? You know, maximal projection raising..?
Like in getting the subject up there. Well that has to be done not only overtly,
but it has to be done fast. Because it has to be done before you build up
a structure that’s going past the checking domain of the ?head with the strength
feature. I’m sorry this is going to get pretty technical here, there’s no
other way to do it. So when you get a checking domain of a strength feature
if you get beyond the checking domain it’s too late, you’re not allowed to
move inside, you know, nowhere to check unless you get rid of this thing
before you get that high, so that’s going to cause er when you have sp movement,
sp raising, you’re gonna have to get rid of it not only overtly but also
fast, and a consequence of that is basically minimality. It sort of falls
out of that.
... theta theory.. movement theory.. adjective
and its complement.. phi features of adjectives.. semantic relation.. structural
relation.. checking features.. you’d never move phrases.. madly pursuing
the intuition.. we see phrases moving; but that could be a mirage.. to satisfy
convergence conditions of pf.. looks as if the things are moving but they
really aren’t.. covert movement where it doesn’t have to come out the mouth..
another theorem.. if true would say that overt movement takes the minimum
phrase that’s required in order to satisfy pf convergence. .. turns out to
be much more natural.. to drop the notion of movement.. go back to an older
notion that was hanging around, say it’s really attraction; it’s not that
something’s moving to target something else, it’s that something’s attracting
somebody to get rid of one of its problems.. category has a strength feature..
look at the closest things, because of the minimum length condition .. and
that should be all of movement theory.. look for the feature that’s gonna
do the job.. ideally that oughta be all there is.. very elegant picture if
it’s true.. vast empirical problems that arise.. let’s go on to specific
matters.. functional categories.. playing a very central role.. what are
they? .. we have evidence for the existence of some of them.. tense has semantics
and it has phonetics.. similarly there’s evidence for complementizers.. similar
evidence for d, the determinant feature. noun phrases
.. evidence for a sort of light verb.. just thin
semantics.. theta theory.. lexical shells.. decomposition theory.. transitivity
is from this theta theoretic point of view a light verb followed by a ? ..
that would make transitives kind of like causatives.
[writes on blackboard] .. so that’ll be what
a clause looks like. .. no space for the agreement complex.. motivated basically
by the fact that it provides structural positions.. interesting proposal..
seven or eight years ago.. separation of various properties of ?infection..
the main thing.. bare phrase level approach.. drop all x bar theory.. could
be any number of specifiers.. it’s beginning to look like they’re right..
seems to fill a gap.. structural position of ?agra.. lemme stop with this..
we’re gonna have parametric variation.. as to how many specifiers.. extremely
interesting.. let’s take tense.. if tense allows no specifiers, you get a
VSO language; if it allows one specifier, you have a SVO language; suppose
it allows multiple specifiers, well then you get what’s called transitive
expletives, multiple specifier languages, except these break up in interesting
ways too, like Icelandic is the case that’s been studied most in depth, mainly
because husky ?training ?center’s at Harvard so we can all ask him questions
but by now there’s been a lot of study of this.. it has double subjects,
and that’s essentially two specifiers.. tense.. the two have very narrow
conditions on them.. if the first is an expletive and the second’s an argument,
you can’t have two arguments.. the same thing sort of happens in German..
what about the possibility of infinitely many of these things.. arbitrarily
many.. with all arguments outside.. the only thing that’s left on the inside
are what is sometimes called agreement elements.. you can everything, you
can do the minimum amount, and you can do nothing. .. That seems to give
the right sort of typology. ... here is the agent.. one specifier of small
v is given by theta theory.. parameter that says I’m allowed to have a
specifier.. the analogue to Icelandic having a transitive expletive.. PhD
thesis.. the subject is higher than the object. The whole literature’s based
on that. It turns out to be false. It’s because people were looking at the
wrong examples. If you look at the right examples, it turns out it’s the
other way round and nobody’s noticed it before. So in actual fact the
object’s is always first and the subject is always second. So you get sentences
which would be like in English er ‘there read these books never any
student’. OK, that’s the way it comes out. Nothing remains in the verb phrase
everything is moved, but it’s there and you know the object’s in the verb
phrase.. and then there’s an expletive in front of the verb. .. tells you
the object must be able to cross the subject.. distance.. is measured by
some property of minimal domains. .
.. we know the subject is moving.. how do we
know that? .. this agrees with the verb.. these two must be equidistant..
forced to get these results.. these are the only things which will converge..
strong evidence that expletives converge.. .. the reason for that is that
merge is free and move costs.. you always do merges if it will converge..
the facts ought to be the opposite of the way they always assumed to be..
the facts were misunderstood.. which is the kind of thing that makes you
think that maybe you’re on the right track. .. this is all stuff that should
have been in there.. but that doesn’t mean anything.. in a couple of months
it’ll all be changed again.. If it works out, it will turn out.. pursue these
intuitions.. then you’ll have strong reasons to believe that language is
a kind of a biologically impossible object.. something like inorganic chemistry..
organic world where everything is messy and so on.. may be the most interesting
thing about human language.. it just seems very different from biological
objects.. [perhaps] all biological objects are like this, we just don’t know
how to look at them.. maybe that’s also true of the biological world.. if
so maybe all of biology might look like this. .. OK.”
[Applause]
-QUESTIONS [Chomsky’s reply mostly on ‘indexing’ & .. previous
work.. it’s all wrong. Japanese has scrambling.. quite interesting work on
this.. new paper.. usual unsolved problems.. look contradictory.,. and
that’s what makes it look interesting..]
[.. successive cyclic w.h. movement.. reflex..
all languages really like.. Irish.. x.p. conjunction.. look at the history
of transformational grammar.. last 40 years.. dramatic evidence.. extra position,
heavy NP shift, ?dp fronting.. were always called stylistic rules.. some
intuition they were something to do with style, not grammar.. interleave
all over the place.. displacement.. aren’t part of the same system of language..
sign is taken care of.. the question of truth and falsehood.. entailment
has the same kind of entailment as rhyme.. entailment relations.. even if
the semantics of lexical items is not complete.. two extremes.. each concept
is an atom.. the other is they all kind of decompose into each other, ..
kill dissolves into die and so on.. they both look wrong.. if things turn
out to be paradoxes..]
-CHAIRMAN stops questions as he’d promised to deliver Noam 1/4 hour before.
Back to Main Index of Rae West's Site