In this episode of The Joy of Science, Shambhavi Chidambaram speaks to Professor Shravan Vasishth, an Indian-origin professor of psycholinguistics at the University of Potsdam in Germany. In addition to his research, Prof Vasishth is an author of two interesting blogs—“Shravan Vasishth’s Slog”, where he talks about statistics, and “Things People Say”, a moving personal blog about his experiences of dealing with kidney failure and hemodialysis, and navigating the German health care system. This interview has been edited for clarity and conciseness and has been run past Prof Vasishth for accuracy before publication.
Shambhavi Chidambaram (SC): Prof Vasishth, your specialization is psycholinguistics. So let’s talk about it. What does a professor of psycholinguistics study?
Shravan Vasishth (SV): Alright, so there’s a general misconception about what linguistics is in the lay world. What typically happens to me in day to day life is that when somebody finds out that I’m a linguist, the first question they ask me is how many languages I speak. From a professional perspective, linguistics is a very different ball game. It’s all about the study of linguistic structure. The meaning of language, language constructs, the syntactic structure of sentences, and the study of sound. Studying the statistical properties of language is another thing. Linguistics also includes the historical development of languages, and the connections between languages, regarding how language contact changes.
SC: But linguistics, as a field, seems to have had quite a rapid change in its approach in the last one hundred years or so. Where did it start and how has it proceeded?
SV: Linguistics, as we know it today, is very different from the way it was in the 1800s and before that. All the action really began with Pāṇini, and his work, the Aṣṭādhyāyī.
Editor’s Note: The Aṣṭādhyāyī is Pāṇini’s most famous work, a text on Sanskrit grammar. Known for the brevity, sophistication and logical perfection of its rules, and has influenced linguistic theory up to the present day.
SV: Once that became known in the West, a lot of Indologists started studying not just Sanskrit but also the connection between European languages and Indian languages. They discovered that there were some historical connections there, and this led to the development of something called historical linguistics—the study of how languages connect with each other, how they have evolved over time and what the proto-language was. They tried to infer the properties of the dead languages, like the classical Vedic language and so on.
SC: Starting with the Aṣṭādhyāyī, now has the study of languages changed over the centuries?
SV: The Indologists, who were mostly Europeans, were studying these connections, and this study of language, from a language comparison perspective, slowly evolved into a school of research called structural linguistics, where people started to look at the linguistic structure and develop templates and patterns that they could draw generalizations from. This then became a very big area of research, especially in America, because of Leonard Bloomfield and others, who were inspired by the work of Ferdinand de Saussure, a Swiss linguist. Saussure wrote a very famous textbook in the 1800s that had a big influence on the American linguists and this led to the creation of structural linguistics. Up to the early 1900s, structural linguistics was the dominant paradigm both in the West and in India. The Christian missionaries played a major role in developing this methodology because their goal was to translate the Bible into local languages. They would go into parts of the world that were obscure for them and study the language there using this methodology. They’d eventually figure out the sound structure, the linguistic structure of the language and then write the Bible in that language. In the 1950s, a now-very-famous linguist turned up—Noam Chomsky. He created a new paradigm in linguistics, which came to be known as generative linguistics. This methodology involved consulting your own intuitions about what is possible and what is not possible in language. This was a brilliant new way to unpack the structure of languages, of your own native language. You could sit down and think about your language and develop a very elaborate syntax and semantics and phonology—the different sound patterns. Thus, it became mainstream and is still the mainstream approach in linguistics. It’s now called theoretical linguistics to distinguish it from the other strands which developed later. While Chomsky was developing the generative linguistics approach, in the 1950s there was a computer revolution. That’s when all the action started on the computer side and what happened was that people started to develop machine translation systems to automatically translate languages. It was a very ambitious program. Although the initial attempts were miserable failures, today this has become a very sophisticated new approach, and you see that in tools like Google Translate. These systems are able to do very sophisticated translations and are really very good at it! This area is known as computational linguistics. Linguists actually work for Google and other companies, developing the basic linguistic software. All this has its origins in this core linguistic work. In parallel to that, what happened in the 1960s was that linguists started working with psychologists, and started talking to them about how language works in the brain. This field went in a parallel direction from classical theoretical linguistics. It was an independent stream of research that eventually became psycholinguistics. There are connections between psycholinguistics and linguistic theory, but psycholinguistics is to a great extent an independent body of work, that is slightly divorced from theoretical ideas.
SC: So, that same linguistic structure that people had been studying theoretically for so long, they just took it to an empirical level that way?
SV: Exactly. Data entered linguistics from the computational side.
SC: Great! But what about related fields like phonetics? Where do they come into this, exactly?
SV: Well, phonetics is actually a part of linguistics. You can think of linguistics at several levels. There’s sound—that’s where phonetics and phonology come into, which are parts of core linguistics. Then there’s semantics—meaning—that is also a core part of linguistics and involves studying formal logic to understand how language and meaning are put together. Then there’s syntax—the structure of words— the word order and how sentences are structured. These are the core areas but there are also related areas that fall under linguistics, like pragmatics, where you study implied meaning—that is, what is not actually said, but what is inferred from the sentence.
SC: That’s sharing a border with poetry, isn’t it?
SV: That is indeed where all the action comes from. In poetry, there are all these implied connotations, which you don’t actually state but feel from the language. It is also part of linguistics.
SC: Then it should also border with the social sciences, shouldn’t it? Because you’ve also got politics there, with these kinds of implied meanings, where certain things mean something specific for certain groups of people. It is probably at the overlap of semantics, linguistics and politics, for example.
SV: Yeah, that social and cultural aspect of language also evolved into its own field and that’s called sociolinguistics. The cultural and social implications of language.
SC: What then interested you, as a scientist, in this field?
SV: When I started out, I was only obsessed with languages. I was not interested in linguistics and, in fact, I thought linguistics was a very, very boring area—dry, boring and abstract. I just wanted to learn languages. My first degree was in Japanese and French at the Jawaharlal Nehru University, Delhi. After that, I ended up in Japan, translating for a law firm. I was translating patents from Japanese to English. Eventually, I realized that there must be a way to automate all this because it was just so boring. I wanted to figure out how to do automatic machine translation. That’s why I decided to quit my job and become a linguist. I then returned to India and studied linguistics, where I was trained in the generative framework. I became very interested in generative theoretical issues. I later went to Ohio State University to do a PhD. During my PhD, I realized that I wasn’t satisfied with the way data was being used to develop the theory. We were relying on intuitions and I was very unhappy with that. I realized, at that point, that I am actually by nature a hard-core empiricist. For me, the data had to be objective. That made me shift into psycholinguistics because that’s where the experimental science was! So that’s how I became a psycholinguist. Since then, about 20 years now, I have been doing experimental work, trying to get empirical data to study theory.
SC: What do you specifically study in your lab?
SV: In my lab, we are trying to study how memory is used when we are interacting with each other using language. When I speak to you, you have to hold the words I utter, in your memory, and put them together in real-time to build the meaning that I am conveying. How does that memory process work? We’ve developed a computational model of this memory process that subserves language processing. We have a computational model that makes predictions about how long it should take to read a word or a sentence, and we test those predictions against data. That’s what I’ve been working on—language comprehension processes.
SC: One thing in particular that I’ve found very inspiring about your teaching is that you don’t hesitate to say you were embarrassed by how bad your previous work was, at the lack of statistical rigour. You mentioned this at the workshop you gave at the Max Planck Institute in Leipzig in November 2019. You said then that you had to unlearn a lot of bad statistics to get from where you were as a PhD student to where you are now. How did you get there? What is it you had to unlearn in order for you to learn it the right way?
SV: When I moved into psycholinguistics, I was given a crash course in statistics. I was given a four-week course to learn enough statistics to do my PhD. What I was told was that you don’t really need to know much to do this sort of thing. You’ll need to know a few tests, you’ll need to know which buttons to press, and you’ll get the p-value out of it. We relied mostly on pre-programmed statistical software packages, and that’s how I did my PhD. My PhD was even published as an outstanding dissertation by Routledge—a British publisher. It was only after I became a Professor and started teaching this stuff that I realized that I had no idea what I was talking about! As I would read about things, I’d realise that I didn’t really grasp the deep concepts that I was trying to teach. So what happened to me was that there came a period in my career as a Professor where I had no funding at all. I had no money, so I had a lot of time on my hands. So I decided to do a Master’s in statistics from Sheffield. That was a turning point for me. I finished the MSc in 2015, and the last five years have been kind of revolutionary for me. That’s when I really started to understand what I was doing, so that has changed my lab, it has changed the work that we do, that has changed the productiveness of the work that we do. We are doing much better quality work now.
SC: Partly because I guess, you’re not doing the work wrong the first time around. There’s no wiping out false starts.
SV: Exactly. I had been releasing all my data in the public domain from 2008 onwards with an instinctive desire to put it all out there. When the psychology revolution and the replication crisis were first being spoken about in 2015, coincidentally, I started thinking about how to systematize the workflow in my lab for my students and how to make things neat and clean when you publish a result so that the reader has access to all the materials. I also wanted to do this because others were refusing to give me their data that I wanted to use for my models. Although some labs were really open about it, most were not. I thought, “OK let me lead by example”. Instead of attacking these people, I thought let me just show them how it can be done. That’s why I became so obsessed with the Open Science movement and trying to show how you could do this. Now, I’m in a position of more influence—I’m on the editorial board of several journals and I’m trying to enforce this new policy in these journals. I am trying to make sure that people actually follow these guidelines and try to do everything right, in the best possible way.
SC: That revolution of Open Science is really catching up. Ten, or even twenty years ago, there was no possibility of publishing one’s own code, and no online data repositories. Today, saying ‘data available on request’ is a historical thing.
SV: Exactly, I think to say that now, is a passive-aggressive way of saying ‘No I won’t give it to you’!
SC: You’re also passing all this on to your students. In your research group’s page on Github, you’ve also written ‘we are interested in statistical theory and practice’. So what does that mean for your students?
SV: It means that we have designed a curriculum both at the undergraduate and graduate level where we teach statistics at a level that’s not taught anywhere in the world, as far as I’m aware. We have a very extensive curriculum teaching both Frequentist and Bayesian statistics. The other thing is my PhD students follow very specific guidelines on how to release data, how to manage their papers and to produce reproducible code so that everybody else in the world can actually redo what we did and use our approaches for their own analyses. We are the cutting edge of the field right now. We use the latest technological tools available, which are not available to many groups in the world. They can now copy our code and use our examples for their own work.
SC: While a section of the scientists is waking up to Open Science, there are others who show resistance to open source resources. Sometimes, this difference is a result of a generation gap, where the older ones are heard saying, ‘Oh we’ve done it this way and we’re not going to change it’. A kind of institutional inertia. There are also limitations in resources because, with many journals, you have to pay to publish open access. There is also the reputation that goes with publishing in high-profile journals.
SV: Indeed, and I have been in this position. I have published in Elsevier journals and continue to do this because that’s the only way to establish credibility. If I didn’t do that, then people would conclude that I can’t publish in top journals. I need to prove that I can do it. I can’t publish only in open access like PLoS ONE and hope to have any credibility in the field. I would only publish the high impact stuff in these mainstream non-open-access journals. I have published good quality work in open access journals, and nobody notices it. But, if I publish something in a top journal, then everybody notices it. So there’s a clear correlation there. I can’t fight it as a single person, as one can fight small battles. The areas where I am trying to change things is that all the material that is in Elsevier journals is also on OSF.
Editor’s note: The Open Science Framework or OSF is an open-source online platform where scientists can publish their study designs, raw data, code and reports to promote transparency and collaboration.
SV: Anybody can access it for free. I try to make it open access when I can afford it but I try to provide everything in parallel in the public domain. That’s all I can do right now. One day, my hope is that I will be the Editor-in-Chief of a major journal, and then I will try to enforce things, in a better way. And that’s not going to happen so easily, because I’m not mainstream! I’m not a white male at a top American University. If I was, the whole story would be different! I’m an obscure scientist in the middle of nowhere, there’s no way I will have that level of influence. For now, I need to know my place in society and work with that!