Researchers who study how information flows on the internet are bracing for the possibility that Elon Musk’s purchase of Twitter — which is not yet final — could change access to archival tweets and other data associated with the history of the social media site.
As people live more and more of their lives online, social media platforms have become massive repositories documenting shifts in society, culture and politics — from blink-and-you’ll-miss-it memes to the disinformation campaigns around the 2020 U.S. presidential election, and even Musk himself drawing the ire of the Securities and Exchange Commission over tweets promising to take Tesla private at $420 a share.
Twitter, which launched in 2006, is one of the web’s older surviving social media platforms and now counts around 400 million users. The company said Monday that it would sell itself to Musk, the world’s richest man, for $46.5 billion — in the process transforming it from a public company to one controlled by a lone billionaire.
Musk is already proposing major changes to how the site runs, including making its recommendation algorithm open-source and loosening content restrictions. But the site has already run afoul of archivists and historians. In 2021, when Twitter banned then-President Donald Trump, it deleted his tweets without sending copies to the National Archives as required by law — and it explicitly refused to let the agency resurrect the tweets for documentation.
“Corporate internet giants have taken over the public role of archives and libraries and have their own policies on how the public record will be shaped and who will access it,” said Anat Ben-David, an associate professor at the Open University of Israel who focuses on the history and geopolitics of the web. “They’re actually changing the very meaning of the public record.”
Social media sites have grown from internet upstarts to major channels of communication, alongside public radio, television and even newspapers, said Amelia Acker, an assistant professor at the University of Texas at Austin’s School of Information who studies how new forms of data are preserved. But those old-school communications formats are governed by laws and expectations; that’s not true of social media. Yet Twitter is where we hear from elected officials, governments communicate about emergencies and regular people post details of their experiences, from the pedestrian to firsthand accounts of war.
“There are stakes there when our public sphere is governed by private corporations that can be owned and acquired by individuals,” said Acker.
When researchers access data on a platform or automatically crawl it to preserve it at places such as the Internet Archive, they are in a push-and-pull with private companies. Facebook notoriously shut down a research project on political ads on the platform by disabling access to site data. It’s a constant concern for researchers who study social media, Acker said. The “more research we produce, the more it risks them pulling back data,” she added.
Documenting the living web
Twitter’s application programming interface (API), which lets outside apps interact with the social media site’s underlying data, is primarily designed for developers rather than researchers. This makes replicating studies and archiving tweets difficult over the long term, because the API changes without warning and users can delete their tweets at will. Researchers and archivists can download public tweets, but to replicate a study, they’d have to download data sets that are constantly shifting in real time.
“Much of the public sphere is now existing in the Twitter archives,” said Michael Nelson, a professor of computer science at Old Dominion University. “It was of concern prior to someone as mercurial as Elon Musk buying it, but now I think everyone is trying to assess what will happen with that information, because there’s potentially damning information, direct messages, and things that have been deleted and not otherwise archived.”
Musk, who has 85.8 million Twitter followers, often deletes his own tweets, for example. In the run-up to the Twitter deal, he spent a weekend tweeting and deleting tweets that were critical of the platform. Tellingly, in the SEC filing of the deal to acquire Twitter, it includes a provision that Musk can tweet about the deal but not disparage the company.
Nelson said there is a misalignment between how Twitter is rendered and what standard web archiving tools can do and how they can interact with the site. While that has improved some, “mostly we can archive little pockets of Twitter,” he said.
He co-authored a 2021 study that examined how a user interface update to Twitter in 2020 resulted in archived Twitter pages displaying Twitter’s “Something went wrong” error message rather than the archived tweet. Earlier this year, Twitter reversed a product decision that resulted in tweets embedded in articles or websites that had been deleted appearing as a white box. Adding an edit button to Twitter, an idea that Musk has publicly entertained in recent weeks, would also make documenting the web more challenging.
“The web is a suicidal medium” given that the average website exists for only 18 months, Ben-David said. And when sites go down, they take information with them. She points to “reference rot” in legal citations, in which pointers to news stories or other references break because the link no longer exists or the address has changed.
“These platforms have totalitarian control of what is public knowledge, and it’s concerning,” Ben-David added. “Now, with Musk taking over Twitter, I personally feel that this concern becomes even greater.”
Acker is less worried that Musk will fundamentally change access to data. She’s taking more of a “wait and see” approach.
But Nelson has two main concerns about the fate of Twitter data going forward. If Musk finalizes the purchase and converts Twitter to a private company not particularly answerable to anyone else, then Twitter’s own archives become suspect. Even if an account is not deleted or suspended, Nelson said he would not be totally sure the content wasn’t modified. The company could also make it difficult for outside researchers or groups to archive tweets.
“To the extent that Twitter could potentially make it difficult, both technically and legally, for information to be archived, that would be the second part of this pincer movement, if you will,” said Nelson.
“If we can’t trust the company’s own archives, and if independent third-party archives become difficult, technologically, legally or some combination thereof,” he said, “then we have this incredibly public resource in which it’ll be hard to trust the replay of the content from it.”
Thanks to Lillian Barkley for copy editing this article.