You are watching ColdFusion TV.
Hi, welcome to another ColdFusion video.
So I’m sure by this stage, most of you guys have heard of Google Duplex.
I was actually in Bulgaria when the news break about this.
So I quickly had to make my way back to Australia to make this video.
So if I’m out of breath, that’s why.
Okay, so in this video we’ll take a deeper look at Google duplex.
So what is duplex?
It’s basically an extension of Google assistant
that can make phone calls to real humans just by you asking it to do so.
It’s a deep neural network that builds off wavenet technology.
Wavenet is a speech synthesis program
that worked by joining very short units of sound together to create a speech.
It was a breakthrough in natural speech synthesis when it came out.
We’ve already taken a look at it in a previous video,
but here’s a quick recap of what it can do.
Aspects of the sublime in English poetry and painting, 1770 to 1850.
This is what happens when you don’t type in anything for wavenet to say.
It’s still generating raw audio to randomly imitate human sounds
and it still sounds like words.
“作为最华丽的人 我们得到了两英尺 那是一个垫子”
He as the head Sheen we’ve gained two feet so it was a pat.
She did just type, dic she isn’t.
[Lip sounds and breathing]
Digger with a shavette to dig could dig slow.
Duplex is another neural network built on top of wavenet.
The final result is an AI that can have realistic conversation,
but with wavenet vocal precision.
这是2018年5月在Google I/O 上Duplex的华丽登场
And here’s the grand unveiling of Duplex at Google’s I/O event in May of 2018.
But even in the US,
60% of small businesses don’t have an online booking system set up.
So what you’re going to hear is the Google assistant actually calling a real salon
to schedule the appointment for you.
Hello, how can I help you?
Hi, I’m calling to book a women’s haircut for a client.
I’m looking for something on May 3rd.
Sure, give me one second.
Sure, what time are you looking for around?
At 12 pm.
Ok, we have a 10 o’clock.
10 am is fine.
好的 预定在五月三日 上午十点 客户是丽萨
Okay, perfect. So I will see Lisa at 10 o’clock on May 3rd.
Okay great, thanks.
好的 祝你一天愉快 再见
Great. Have a great day. Bye.
[Whistles and applause]
Hi, may I help you?
Hi, I’d like to reserve a table for Wednesday, the 7th.
For seven people?
Umm… It’s for 4 people.
四个人 什么时候 今天还是今晚
Four people? When? Today? Tonight?
Umm… Next Wednesday, at 6pm.
Actually, we reserve for like upwards of five people.
For four people, you can come.
How long is the wait usually to be seated?
For when? Tomorrow? Or weekend? Or…
For next Wednesday, the 7th.
不 人不是很多 四个人可以直接来 可以吗
Oh no, it’s not too busy. You can come for four people, okay?
嗯 我懂了 谢谢
Oh, I gotcha. Thanks.
[Whistles and applause]
But our assistant understands the context to nuance,
it new acts for rate times in this case
and handle the interaction gracefully.
According to Google’s blog, the ums and ers that you hear
are put in sometimes synthetically,
but actually it sometimes is there to signal that the system is still processing,
just like a human would.
The public reaction to this was on the side of shock and horror
and even anger by some.
Some people thought it was very deceitful
to have an AI talk to someone over the phone without them knowing.
But Google has made it very clear that they’re going to be transparent.
They’ll be letting the people on the other end of the phone know that they’re talking to duplex.
Duplex has been trained in the narrow field or scheduling appointments or bookings,
and inquiring about a business’s opening hours on holidays.
To be clear, Duplex cannot have general conversations.
But I have little doubt that the scope will widen in the coming years.
Google stated that there were unique challenges when it came to training such a neural net.
How do you get an AI to robustly understand natural language
and reply in a realistic manner?
This will be pretty hard to do.
比如 相比计算机 每个人的说话方式都不同
For example people tend to talk differently to one another than with computers.
We talk faster, correct ourselves mid conversation
and even a mid parts of conversation and rely on context instead.
Throw in the poor quality and noisiness of a phone line,
and you have a pretty hard challenge on your hands.
To solidify this point a little bit,
Let’s think about the phrase, “Ok for 4”.
It’s such a simple sentence but it relies on many previous sentences for context.
This phrase could refer to a time or an amount of people.
We as humans take such things for granted,
but it’s interesting to take a brand new look at this through the eyes of an AI
or a team of researchers trying to solve this problem.
OK, so how does Duplex work?
Duplex uses something called a recurrent neural network.
If you don’t know what a neural network is,
it’s basically a massive matrix multiplication function
where each part of the matrix is built up of artificial neurons, called nodes.
The nodes contain a mathematical formula and are arranged in layers.
And each node has an input and an output.
After receiving the inputs, whatever they may be,
The end goal of the whole entire matrix or neural network
is basically to find out how to reduce how long it is,
or in other words reduce the amount of error.
Perhaps the strangest thing about neural networks
is that no one actually knows how they come to their conclusion.
You just give them the inputs and they somehow get an answer.
Neural networks have been around for a while.
实际上 1971年帮助创建第一个CPU的人 Ted Hoff
In fact, Ted Hoff, the guy who helped create the very first CPU at Intel back in 1971,
actually worked on neural networks in his early career
but practical neural networks have only been possible in the past five years or so.
Since about 2012 the general complexity of neural networks has advanced 500 times,
and I think it’s one of the most fascinating fields of computer science.
If you want a more detailed explanation of how neural networks work.
There’s an absolutely brilliant video by the YouTube channel 3Blue1Brown.
It’s a remarkable explanation. I’ll leave a link in the description below.
But anyway, I’m getting sidetracked.
So the specific type of neural network that Duplex is using
as I mentioned is called a recurrent neural network.
These type of networks have a small internal memory
that allows them to remember specific inputs to help understand context.
For this reason, it’s pretty much the perfect neural network for speech recognition
and it’s at the heart of most speech recognition algorithms.
So as I mentioned earlier,
Duplex was trained on a whole bunch of different phone conversations.
But how did it learn to understand what was going on?
Well, the conversations, of course, start an analog speech
and then this speech in audio form is fed into Google’s automatic speech recognition system.
From this point on this audio is now understood as text.
In this text, once converted into a format of the neural network that can understand
is then fed into the system.
Other metadata and other wider context information from the calls,
such as the correct time for the appointment schedule in question,
all the time of day is also fed into the system.
So tying it all together
when the neural network listens to a whole bunch of phone recordings as inputs,
it eventually learns how to reduce the amount of error it has,
meaning that it has better responses when spoken to.
The final train neural network that comes out of all of this is Google Duplex.
How can I help you?
– 你好 -你好 怎么了
– Hello?- Hello, what’s up, man?
Hey, um… I wanted to know what are your hours for today.
10 a.m. to 6:30 p.m.
好的 明白了 谢谢
Okay, got it. Thank you for your time.
No problem, sir. Bye.
So what can Duplex do?
In addition to what was shown in the Google demo,
Duplex can also do some pretty interesting things,
such as handle interruptions.
Ok, what’s your phone number?
– 223 -22什么
-Two two three-Two two what?
– 223 – 好 223
-Two two three.-Ok, two two three.
Hi, I would like to reserve the tables for May 25th.
Sorry, what day?
周五 嗯 五月二十五日
For Friday, um… May 25th.
And respond to audio issues over the phone.
– 你在吗 – 在 我在这儿
-Are you here?-Yeah, I am here.
So there was a lot of talk about this AI passing the Turing test.
So for those of you that don’t know, the Turing test was a thought experiment
brought up by Alan Turing, one of the fathers of computer science, back in 1950.
He proposed a test that goes as follows:
A person would interact with a machine that they couldn’t see over text.
This individual doesn’t know if they’re talking to another person or a machine.
If the machine can interact with this individual
without them suspecting that they’re talking to a computer
then that machine passes the Turing test.
Alan Turing made the prediction that by the year 2000,
we would have machines that would be able to pass the test.
And he wasn’t that far off.
The first machine to pass the test was a text-based chat bot in 2014.
The Turing test was originally just for text.
And Google Duplex seems to have passed the Turing test in the very narrow field of conversation
when it comes to making appointments,
but through voice, not just text.
I bet that’s definitely something that a lot of leading computer scientists didn’t see happening just yet.
So I found that very interesting on that point.
So some final thoughts:
Some people may think that this may take jobs away from telemarketers,
and other phone based work in the future.
This may very well be possible,
but I’d just be being reactionary if I was to say where this was a certainty.
Or if I was to say that this will be a net negative on society in any way,
because at this point it’s far too early to tell conclusively.
A reminder: Duplex isn’t for general conversation.
It’s only specifically for booking reservations and inquiring about open hours during holidays.
But I’m sure its scope is going to grow.
As far as the assistant goes,
due to its limited scope, it doesn’t really change how we live our lives,
unless you have a disability I’d say.
If a store has an online page,
it’s far quicker just to book that way,
but then again Duplex is using a neural network.
And time and time again on this channel, we’ve seen how those things can surprise us.
The scope may just grow quicker than we think.
But on a wider night, this technology is pretty cool.
It’s becoming clear that we’re at an inflection point when it comes to AI.
过去几年 我们在AI领域卓有建树 像是AlphaGo等
In the past couple of years, we’ve been marveling at AI breakthroughs like AlphaGo and others.
But now it seems like we’re starting seeing real world applications of AI coming into view.
Each time making all this once- thought-impossible possible.
I think this will start to get more and more commonplace
as breakthroughs keep occurring in parallel.
So am I worried about Duplex?
Not at all, really. I don’t see anything that should make me.
谷歌声明当你跟Duplex AI对话时 信息会透明
Google has stated that they’re going to be transparent when you talking to a Duplex AI.
But if this technology becomes commonplace,
ask me that question again.
But I think all in all, society has to accept that we’re at the doorstep of a brand-new era,
a time with unimaginable possibilities.
I think that’s a bit of a privilege as it stands right now.
好 这就是视频的总结 谢谢观看
Anyway, that just about wraps up this video. Thanks for watching.
This has been Dogogo you’ve been watching ColdFusion.
Feel free to subscribe if you just stumbled across this channel.
And I want to thank all of you guys that came to the Dubi Blockchain Summit.
It was really cool meeting some of you guys.
好 就这样吧 我们下个视频见
Anyway that’s it. I’ll catch you again soon for the next video.
Cheers guys, have a good one.
ColdFusion. It’s new thinking.