ELI5: Language Grounding

Created	@November 21, 2020 8:02 AM
Tags	AIGroundingNLP

This post is an attempt to illustrate my current research interests to outsiders.

If a curious relative, friend at a party or a little kid asked me what my research is about, I'd try to have the following conversation. Depending on the specific person, this would have to be adapted somewhat of course.

Carli Curious: So what is your research roughly about?

Me: Okay, let's start with some quick thought experiment: image you are living on an island with your fellow people, isolated from all civilization. And for simplicity, let's say all of you only speak English. But one day a stranger gets stranded at your shores who doesn't speak English, and there is no Internet to help you out. What do you do integrate and teach them?

Carli Curious (after some thinking): So I would first try to point to common objects like food, or plants and so on. That might take a few days to establish this. Then I might try verbs, so I would demonstrate actions and hope the stranger associates the verbs with that action. Somehow later on I would have to teach words like "not", "all" and so on. That might be a bit trickier...

But what does that all have to do with your research?

Me: So I want to do with a computer what you just described with this stranded stranger. I want the machine to learn language, so words, sentences and so on, by connecting it to the real world. The thing is: while that seems intuitive, most current AI models don't do that. Instead they have no connection to the real physical world and just get raw static text. And this works surprisingly well since computers are way better at keeping track of statistics/correlations than our stranded human. So to come back to our scenario and give an analogy for how current AI models often work: You would hand this stranded person all your documents: Let's assume you are all very organized people on this island, so you have to documented everything about your island's history, animals, people and even fiction. Now you would let the stranger read that for some years in some isolated cabin (poor human!) and then come back to test how well they can write in your language. And when I say test, I mean you could for example give the stranger a sentence that they should complete based on their understanding of the documents they have read, like "On top of every pizza margherita, we put tomatoes and [BLANK]." Even after reading your documents for a few years, this might be tricky for a human, since our "software" is different than that of our current language models. But they might be lucky this time and remember that they have seen the symbols "pizza", "margharita" and "tomatoes" a lot together with this symbol "cheese" and might therefore guess it correctly. It's tempting here to be biased because, when you look at symbols like "tomato" or "pizza", they inevitably evoke some semantic ideas in your mind. So instead really try to put yourself in their perspective: you see arbitrary symbols "dfsdfh"(pizza), "wascjks"(margharita) and "öbhjf"(tomoatoes) and are expected to say "teüwh"(cheese) because you have seen teüwh a lot in those contexts. That's what our models do and they are really good at it!

Carli Curious: Yeah, sounds hard! But don't we sometimes hear that some people can actually learn a language just by being immersed in a new country or something? Do we really need to teach them they language by actively pointing to stuff etc.?

Me: Good point! Yes, our stranded stranger could hopefully also learn English by living with us for a few years, just like babies that are born on this island. They would eat and work with us and when they eat a pizza with us, they might often hear (or read) the symbol "pizza". So this would also involve some statistics but of a different kind than the statistics I mentioned before when the person had to read all those documents. After eating and baking pizza with us several times, the stranger's mind would make sense of things: "aha! I've seen, tasted and even made this thing a few times and in that context I've often heard the sound 'pizza'! These experiences of mine must be related to this sound." This is also statistics but it's statistics we are more naturally good at, and it's statistics with much richer structure; structure in the sense that the making and seeing of the pizza has geometric, physical etc. properties. While machines are not inherently good are bad at certain kinds of statistics (since we can sort of control that), they should definitely learn more and quicker when there is more structured content to learn from, aka the real rich world around us (me gesturing around to Carli Curious)

Now once this person is sort of fluent they can actually learn new words and linguistic phrases just from raw text. So after this initial "grounding" phase, it might be a good idea to give them your documents and let them learn new words; just like you and I could look up a word in the dictionary that we don't know because we already learnt the dictionary language in a grounded way.

Also as a final note for our next conversation, apart from this connection to the real world, we also have to go from static language (like the documents the stranger had to read in his cabin) to dynamic language like dialogue with questions, inferring the state of the other person and so on. A lot to work on!