The Samantha-effect: a closer look into the future of bots

Posted by Justin Lee on 3/1/18 8:48 AM

In 2013, Spike Jonze released the indie-Sci-Fi-drama-romcom ‘Her’.


The film follows the evolutionary trajectory of the world’s first artificially intelligent operating system, ‘Samantha’. 

Scarlett Johansson references aside, Samantha is every developer’s daydream: the perfect example of ‘real’ Artificial Intelligence (AI). An impressive conversationalist, she grasps context, natural language, emotions and common sense.

Obviously, modern bots aren’t quite on Samantha’s level. At best, they’re capable of offering specific results via a conversational interface. At worst, they’re-chuck-your-phone-against-the-wall-frustrating.

As technology is evolving at lightning speed, could we be moving towards an era where a Samantha-like bot could exist?

Emotionally intelligent data

Today’s AI combines symbolic processing (AI using explicit rules and logic) with machine learning in a way that exploits their respective strengths and reduces their weaknesses.

For example, symbolic processing lets us specify knowledge and behaviours that may be difficult to learn from just data. Machine learning complements this by helping the system adapt to unexpected situations and new concepts.

This combination also has useful properties beyond speeding up the learning process. In the future, it could give us the ability to interact in a more human-like way with bots.

Because, for all its enticing qualities, machine learning is still basically a statistical process. This means it’s a reflection of the quality of the data it depends on.

This is a blessing and a curse. The sensitivity of machine learning to the characteristics of its input data means that it can easily learn the wrong thing, as demonstrated in the case of Tay, Microsoft’s very own racist robot.

It’s also been argued that the data which is selected for machine-learning often reflects the unconscious biases of the researcher/developer (usually young white males).

To approach Samantha’s capacity for introspection and reasoning, we would need to program some kind of Artificial Emotional Intelligence into our bots: to enable them to answer

“why did you choose to do this?”

And even today’s relatively simple systems are taking steps towards synthesizing individualistic answers to questions.

But in the meantime, ensuring that we apply our machine learning algorithms to good quality data in a closely-monitored environment is the closest we will get to emotionally intelligent machines.

The Semantic Web

World Wide Web creator Tim Berners-Lee was way ahead of the times when he proposed the Semantic Web back in 2001:

‘a system that would allow computers to infer meaning from the relationships between resources in the Web’.

Essentially, machines would be able to link together ideas, concepts and facts instead of documents and pages.

This would allow Samantha-like ‘computer assistants’ to ‘read’ information about us and operate on our behalf: autonomously scheduling our appointments, organizing our travel, making dinner reservations. Hello, PA of our dreams.

Berners-Lee’s vision has already been realized to an extent by Google’s Knowledge Graph, which required Google to hire thousands of people to input explicit rules into a symbolic representation of common-sense knowledge.

This allows Google to answer questions with a little box of structured data (below), rather than just a list of web pages.

Image recognition technologies might also be a precursor to the semantic web; this AI can already recognize keywords, demographics, colours and faces in images.

Natural speech recognition

The ability to understand natural language is central to a successful bot; with text, but also with voice recognition.

Ideally, a bot should be able to reach its goal by filling in the conversational blanks automously.

Siri, Cortana and Alexa aren’t quite at Samantha’s standards yet. But her performance in this area doesn’t seem out of reach.

Speech recognition error rates are falling at roughly 20% each year, with Google’s recently reaching just 4.1% — almost as accurate as that of humans.

These improvements can be attributed to a series of innovations: multi-microphone arrays, directional beams, sophisticated noise processing and the application of voice biometrics.

Auditory scene analysis is another promising technique that tries to separate different sound sources.

Voice recognition also benefits from the increasingly huge pool of data used to train statistical models using machine learning techniques: one of these is called Deep Neural Networks (DNNs).

DNNs consist of multiple connected layers of processing units inspired by the neural networks of the human brain.

They can classify a variety of inputs — images, word sequences, locations and speech utterances — into desired categories, such as words, objects and meaning representation.

Context and abstraction

The type of reasoning we do as humans relies on understanding context and making inferences. Sometimes these are logical, but more often than not they’re based on our common knowledge of the world: realizing that X will probably lead to Y.

When applied to AI, the ideal virtual assistant should be able to propose alternatives when faced with a constraint, consider different possibilities and understand their merit:

Me: “I’d like to buy some shoes at Melissa today”
Bot: “Sorry, Melissa is closed. Why don’t you try Office, which is similar and nearby?”

A bot could, in principle, learn these associations through trial and error, but this would take ages: for every interaction there are a million different variations.

Some AI researchers think that the key for computers to learn by analogy. It’s this capacity that allows humans to generalise from one situation to another, to abstract away from context to a higher level of understanding

Human-sounding bots

The emotional range and inflection in Samantha’s voice is still beyond our reach, but new speech generation models such as Google WaveNet are able to produce ‘eerily convincing’ artificial voices.

No more robotic Siri.

A natural-sounding voice massively increases our impression of a machine’s intelligence. The downside of this is that it raises expectations on behalf of the user, potentially leading to conversational failures and frustration.

And we all know how much less patience we have with a machine than with a human being…

Whether or not a bot could live up to our expectations, we’re predisposed to put our faith in machines that exhibit human-like behaviour (wordplay, puns, jokes, quotations and emotional inferences).

This brings to mind the Turing test, wherein a human judge engages in a conversation with another human and a machine. If the judge is unable to tell the machine from the human, the machine has passed the test.


Can AI help us develop ‘insight’?

An insight, at its core, gives you something that is both new and valuable. Most importantly, though, it helps guide future decisions and actions — something tricky for software to understand.

We know an insight when we see it, but it’s hard to define or draw sharp boundaries around it. And being able to develop an insight involves identifying patterns, relationships, correlations.

Some machine learning projects have already delved into this area.

For example, Google’s Deep Mind (which also uses DNNs) is good at identifying new patterns, at a level of complexity that in one case (the game Go) exceeds that of human beings. But techniques like this only work if we tell the machine what the goal is (in this case: winning the game).

Only human beings can identify new patterns which are interesting without a pre-set goal. A machine can identify patterns if it understands the goal; only humans can identify patterns creatively.

On the other hand, one problem humans face is that when we focus on a specific task, we rely on a lot of information. Some of that information has been acquired unconsciously, through experience; some of it through deliberate learning.

But as the volume of information increases, the amount that we can sift through becomes proportionally less and less.

So, here’s where machines have an advantage; they’re better at dealing with vast pools of data.

When we assemble and integrate this data, machines can collaborate with us in the areas where we struggle.

And at the moment, most digital information is in the form of text — that is, the data is unstructured, rather than the structured data found in traditional databases.

Textual data

This takes us into the area of machine reading which, over the last 20-odd years, has moved out of the research labs and into commercial applications.

While still far from perfect, automated techniques for understanding written texts have recently matured rapidly, boosted by new developments in AI.

Here’s an example.

A patient walks into a doctor’s office with a rare disease that the doctor isn’t familiar with. The doctor consults his digital assistant, which scans all the medical journals, facts and information ever written about the disease in seconds.

It then summarizes this information and presents it to the doctor in bitesized chunks.

Maluuba, a deep learning startup recently acquired by Microsoft, are trying to develop a ‘literate machine’ that can read text and learn to communicate on this basis.

Essentially, computers can survey, curate and summarise massive amounts of text in ways we could never dream of.

And we don’t have to use highly technical computer programming languages or database queries to say what we are looking for: recently developed systems can also use natural language to ask questions about a text.

This is something we’re already familiar with to some extent, thanks to typing queries into the likes of Google (auto-complete makes this even easier).

All of this makes bots a natural inheritor of this tradition.

The final coup de grace is that we now have personalization by apps that learn by interaction.

For instance, Replika develops idiosyncratic patterns of language based on scanning through past chat sessions, creating a knowledge of a user’s preferences and priorities.

And this takes us very close to the territory of Samantha.

AI-powered human intelligence

During the first industrial revolution, machines began to take over manual labour.

Today, with Industry 4.0, they’re taking over manual mental labour, too.

Almost any routine-based task can be automated.

And, as salespeople and marketers, there are plenty of everyday tasks we wouldn’t mind relieving ourselves of, like:

…manually inputting data entries, searching for topics and idea articles, wading through endless emails…

By passing these on to an Artificially Intelligent ‘mind’, we’re able to focus our efforts on areas that we (as humans) excel in, and that distinguish us (as humans) from machines.

This was our starting point in developing GrowthBot, our sales and marketing bot designed to help you grow your business.

AI can’t produce new insights; but it can help us get our creativity into play more effectively.

We might look at a bunch of data, relationships and correlations, and think we see something new, something that helps us understand the world better.

After all, that’s what humans are good at.

But is that insight a mere hunch, or more than that? That’s where we can instruct automation to step in, to eliminate assumptions and create transparency.

The way we see it, the most promising aspect of AI is not the ability to replicate a lifelike companion like Samantha — but the amplification of our own intelligence as humans beings.