ADM-201 dump PMP dumps pdf SSCP exam materials CBAP exam sample questions

虚假写作的时代已然来临 – 译学馆
未登录,请登录后再发表信息
最新评论 (0)
播放视频

虚假写作的时代已然来临

The era of fake writing is upon us

A mildly fun thing to do when you’re bored
无聊的时候 你可以做件有趣的事
is start the beginning of a text message,
即仅在开始时输入某个词
and then use only the suggested words to finish it.
然后只使用提示词来完成整条信息
“In five years I will see you in the morning and then you can get it.”
五年内 我会在早上见到你 然后你就可以得到它
The technology behind these text prediction is called of language model
文本预测背后的技术叫做“语言模型”
A computer program that uses statistics
这是一个利用数据来预测句中
to guess the next word in a sentence.
下一个单词的电脑程序
And in the past years,
过去几年里
some other newer language models
其他一些较新的语言模型
have gotten really, weirdly good at
不可思议地变得非常擅长
generating text that mimics human writing.
生成类似人类写出的作品的文本
“In five years, I will never return to this place
“五年内 我不会返回此地
He felt his eye sting and his throat tighten.”
他感到眼睛刺痛 喉咙发紧”
The program completely made this up.
这完全是程序编出来的
It’s not taken from anywhere else
这不是从什么地方抄来的
and it’s not using a template made by humans.
也不是使用人类制作的模板来完成的
For the first time in history,
历史上第一次出现了
computers can write stories.
电脑也能写故事的现象
The only problem is that
唯一的问题在于
it’s easier for machines to write fiction than to write facts.
机器更容易虚构出文本 而不是描述事实
OPEN SOURCED recode by Vox
开放源码软件 由Vox重新编码
Language models are useful for a lot of reasons.
出于多种原因 语言模型是有实用价值的
They help “recognize speech” properly
在声音模糊的情况下
when sounds are ambiguous in speech-to-text application
它们能帮助语音转文本程序正确识别语音
And they can make translations more fluent
当某词在另一语言中有多个对应词汇时
when a word in one language maps to multiple words in another.
它们还能使译文更加流畅
But if you asked language models to simply generate passages of text,
但如果仅让语言模型生成文章段落
The result never made much sense.
生成的段落基本读不通
So the kinds of things that made sense to do
当我们令它生成下列文本
were like generating single words
比如单个单词或者极短的短语时
or very short phrases.
它才能生成有意义的语句
For years, Janelle Shane has been experimenting with language generation
多年来 Shane一直尝试为她的博客AI Weirdness
for her blog AI Weirdness.
生成语言文本
Her algorithms have generated paint colors, “Bull Cream”
她的算法生成了油漆的颜色“公牛奶油”
Halloween costumes, “Sexy Michael Cera”
还有万圣节服装“性感的迈克尔·赛拉”
And pick-up lines. “You look like a thing and I love you.”
以及“你看起来不错 我爱你”
But this is what she got in 2017
但这是她2017年得到的文本:
when she asked for longer passages,
当她在创作如小说的第一句
like the first lines of a novel
这类更长的段落时
The year of the island is discovered the Missouri of the galaxy
小岛的这年 被发现是银河系的密苏里州
like a teenage lying
像是躺着的少女
and always discovered the year of her own class-writing bed.
总是发现自己的课堂写作床的年份
Shane: It makes no sense.
这压根读不通
Compare that to this opening line from
把它跟新语言模型GPT-2
a newer language model called GPT-2.
生成的开场白做个对比:

It was a rainy, drizzling day in the summer of 1869.
那是1896年夏天里下着毛毛雨的一天
And the people of New York,
以及纽约的人们
who had become accustomed to the warm, kissable air of city
已经习惯了这座城市温暖拂面的空气
were having another bad one.
又要碰上一件不好的事了
Joss: It’s like it’s getting better at bullsh*tting
它好像更擅长鬼扯了
Shane: Yes, yes, it is very good at generating scannable, readable bullsh*t.
对啊 它很擅长生成这些通俗可读的瞎话
Going from word salad to pretty passable prose
从“沙拉”一类的单词 到通顺的散文
took a new approach in the field of natural language processing.
它在自然语言处理领域中实践了一种新的方法
Typically, language tasks have required carefully structured data.
通常 语言任务的完成需要精心构造的数据
You need thousands of correct examples to train the program.
你需要成千上万的正确例文来训练程序
For translation you need a bunch of samples of the same document in multiple languages.
需要同一文档的一组多语言样本来翻译
For spam filters, you need emails that humans have labeled as spam.
需要被人们标记过的垃圾邮件来完成它们的过滤
For summarization, you need full documents plus their human-written summaries.
你需要完整的文档以及人工编写的摘要来完成概要
Those data sources are limited
这些数据来源受限
and can take a lot of work to collect.
所以收集它们要花大量功夫
But if the task is simply to guess the next word in a sentence,
但如果任务仅仅是预测句子中的下一个单词
the problem comes with its own solution.
解决方案本身就带来了问题
So the training data can be any human-written text,
训练材料可以是各种人工文本
no labeling required.
没有类别要求
This is called self-supervised learning.
这叫“自我监督式学习”
That’s what makes it easy and inexpensive to gather data,
这使收集数据既简单还成本低
which means you can use a lot of it.
这也意味着你可以找到大量数据
Like all of Wikipedia, or 11,000 books,
包括所有维基百科条目或11000本书
or 8 million web sites.
或者800万个网站
With that amount of data, plus serious computing resources,
这些庞大的数据加上完备的计算资源
and a few tweaks to the architecture and size of the algorithms,
和对算法的结构和规模的一些改进
these new language models build vast mathematical maps
帮助了这些新语言模型构建庞大的数字地图
of how every word correlates with every other word,
该地图有关单词与其他所有单词间的联系
all without being explicitly told any of the rules of grammar or syntax.
但没有被明确告知任何语法和句法规则
That gives them fluency with whatever language they’re trained on,
无论是哪种语言的文本它们都能写得很流畅
but it doesn’t mean
但这并不意味着
they know what’s true or false.
它们能够区分真假
To get language models to generate true stories,
为了让语言模型生成内容真实的文章
like summarizing documents or answering questions accurately,
如总结文档或准确地回答问题
it takes extra training.
则需要对它额外训练
The simplest thing to do without much more work
最简单方便的方法就是
is just generate passages of text,
让它生成那些看起来连贯
which are both superficially coherent and also false.
但内容虚假的文本段落
GEITGEY: So give me any headline that you want a fake news story for.
随便给我一个你想要的假新闻标题
JOSS: Scientists discover Flying Horse.
科学家发现了会飞的马
Adam Geitgey is a software developer
Adam Geitgey是一个软件开发者
who created a fake news website
他创建了一个假新闻网站
populated entirely with machine generated text.
里面的新闻全是机器生成的文本
He used a language model called Grover,
他用的语言模型是Grover
which was trained on news articles
它是由五千多份
from 5,000 publications.
已出版新闻文章训练出来的
So go and see what we’ve got.
看看我们得出了什么
“More than 1,000 years ago,
“一千多年前
archaeologists unearthed a mysterious flying animal in France
考古学家在法国发现了一种会飞的神秘动物
and hailed it the ‘Winged Horse of Afzel’
并称其为“有翅膀的阿夫泽尔之马”
or ‘Horse of Wisdom’.
或”智慧之马”
GEITGEY: This is amazing, right? like, this is crazy.
这很神奇对吧 让人惊讶
JOSS: Its so crazy like…
确实很让人惊讶 就像……
GEITGEY: It remains coherent all the way to the end, you know.
要知道 文章通篇都很流畅啊
GEITGEY: “The animal, which is the size of a horse, was not easy.”
“这种动物有马那样大 这很难得”
If we just Google that Like there’s nothing
在谷歌上搜索不到它
JOSS: It doesn’t exist anywhere.
毕竟它也不存在
GEITGEY: And I don’t want to say this is perfect.
我并不认为这是完美的
But just from a longer term point of view
但从更长远的角度看
of what people were really excited about three years ago
人们三年前所感兴趣的东西
versus what people can do now, like,
和人们现在可能感兴趣的东西之间
this is just like a huge, huge leap.
存在着非常非常大的差距
If you read closely, you can see that
如果你仔细阅读 你会发现
the model is describing a creature
该模型描述的是一种
that is somehow both mouse-like and the size of a horse
既像老鼠相像又跟马差不多大的生物
That’s because it doesn’t actually know what it’s talking about.
它实际上并不清楚自己在讲些什么
It’s simply mimicking the writing style of a news reporter.
它只是在模仿新闻记者的写作风格
These models can be trained
这些模型经过训练
to write in the voice of any source, like a twitter feed,
可以写出任何东西 比如推文
“I’d like to be very clear about one thing.
“我想澄清一件事
shrek is not based on any actual biblical characters,
史瑞克的原型不是任何一个圣经人物
not even close.”
甚至都不相似”
Or whole subreddits.
它们还能在reddit上发帖
“I found a potato on my floor.”
“我在家里地板上发现了一个土豆”
“A lot of people use the word ‘potato’
“很多人使用’土豆’这个词
as an insult to imply they are not really a potato,
来侮辱别人 暗示说 他们不是土豆
they just ‘looked like’ one.”
只是像一个土豆而已”
“I don’t mean insult, I mean as in as in the definition of the word potato.”
“我没有侮辱的意思 我说的土豆就是字面意思”
“Fair enough. The potato has been used in various ways for a long time.”
“很好 土豆在很长一段时间内有着各种用途”
But we may be entering a time when AI-generated text
但我们可能进入了这样一个时代:
isn’t so funny anymore.
人工智能生成的文本变得严肃了
“Islam has taken the place of Communism as the chief enemy of the West.”
“伊斯兰已经取代了共产主义 成为西方的要敌”
Researchers have shown that
研究人员发现
these models can be used to flood government websites
这些模型的产物可能会充斥政府网站
with fake public comments about policy proposals,
如就政策提议发表虚假公众评论
they can post tons of fake business reviews,
发布大量虚假商业评论
argue with people online,
在网上与他人争论
and generate extremist and racist posts
发布极端主义和种族主义的帖子
that can make fringe opinions seem more popular than they really are.
让这些边缘观点看起来更加盛行 实则不然
GEITGEY: “It’s all about like taking something you could do
这都是我们可以做的事情
and then just increasing the scale of it,
而其规模的增加
making it more scalable and cheaper.”
使它更广泛易得
The good news is that some of the developers who built these language models
好消息是 一些发明这些语言模型的开发者
also build ways to detect much of the text generated through these models.
也发明了侦测这些模型所产生文本的方法
But it’s not clear who has the responsibility to fake-check the internet.
但我们仍不清楚网上虚假信息应由谁检查
And as bots become even better mimics with
随着机器更加善于模仿
faces like ours, voices like ours,
我们的面孔 声音
and now our language,
现在还有我们的语言
Thoes of us made of flesh and blood may find ourselves
我们这些有血有肉的人会发现
increasingly burdened with not only detecting what’s fake,
变得越来越难的不仅包括辨别什么是“假”的
but also proving that we’re real.
还有 证明我们是“真”的

发表评论

译制信息
视频概述

利用计算机程序和人工智能技术,机器能轻松地写出各种文章,随着技术的发展,许多问题渐渐浮出水面

听录译者

收集自网络

翻译译者

Clio

审核员

审核员SS

视频来源

https://www.youtube.com/watch?v=gcHkxP9adiM

相关推荐