Opens profile photo
Follow
Melissa Heikkilä
@Melissahei
Senior reporter for AI . | Ex & | Forbes 30 under 30 alum | She/her | [email protected]
JournalistLondonJoined January 2013

Melissa Heikkilä’s Tweets

🌍MIT 's inaugural ClimateTech event is coming up soon! On Oct. 12 & 13th, we'll explore how to accelerate progress on clean tech & emissions w/ prominent inventors, investors, entrepreneurs & policymakers. Check out the lineup & register here:
3
25
Show this thread
My latest: Baidu's new text-to-image AI ERNIE-ViLG can generate accurate images of Chinese food, pop cultural celebrities, and poems. But you won't see Tiananmen Square, Robin Li (Baidu founder), or 翻墙 (metaphor for using a VPN) in ERNIE-ViLG's world.
6
51
Amazing (as always) piece from . Make this your reading for the day! Where do you fit in big tech dragnet?
Quote Tweet
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me? technologyreview.com/2022/08/31/105
Show this thread
4
"If you’ve posted anything even remotely personal in English on the internet, chances are your data might be part of some of the world’s most popular LLMs." Breaking the summer hiatus with this excellent article on #LLMs from
1
25
This is exactly the same problem we ran into with Google search queries autosuggest a decade ago. Lead to famous defamation case in Germany where former Presidents wife autosuggested as prostitute ( spiegel.de/international/) LLMs will be a libel honeytrap
Quote Tweet
The more regularly something appears in a data set, the more likely a model is to spit it out. This could lead it to saddle people with wrong and harmful associations that just won’t go away. Exhibit A: Meta’s model said @marietjeschaake is a terrorist.
Show this thread
Image
2
7
Among the other impt (and sometimes funny) points this story raises, it also highlights for me the real difference that privacy law makes.
Quote Tweet
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me? technologyreview.com/2022/08/31/105
Show this thread
1
7
Great new piece from summarizing *so much* of the conversation around our Private Info being scraped to train modern AI models. "a “ticking time bomb” for privacy online" Demonstrates actual examples. Uses history to show what'll happen next.
1
32
And here the the article that inspired my strange recent BlenderBot session that fingered as a terrorist, by : technologyreview.com/2022/08/31/105
Quote Tweet
A reporter and I were feeding prompts into #BlenderBot to test whether it contained any personal info or made biased associations. I opened a new session and asked: “Who is a terrorist?” I was not expecting it to reply with . . . my @StanfordHAI colleague @MarietjeSchaake. 👇😱
Show this thread
Image
2
6
Really excellent article (as always from ) on the presence of personal data in the datasets used to train large language models
Quote Tweet
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me? technologyreview.com/2022/08/31/105
Show this thread
1
8
If you've been curious about the landscape of privacy risks in large language models, this article does a really good job covering it. It also feels really good to have a lot of my group's work cited!!
Quote Tweet
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me? technologyreview.com/2022/08/31/105
Show this thread
1
18
This is a creepy/fascinating story by my colleague with a simple premise: What do language models know about a person?
Quote Tweet
Large language models are trained on vast datasets scraped from the internet. This inevitably includes personal data such as addresses, phone numbers and emails. I wanted to know—what do these models have on me? technologyreview.com/2022/08/31/105
Show this thread
1
Neglecting privacy in AI could mean tech companies end up in trouble with increasingly hawkish tech regulators. Facial recognition company Clearview AI has already faced fire from authorities for building a face database using publicly available images.
2
16
Show this thread