The rapid evolution of data scraping technologies — the automated retrieval of data from websites using software or technical tools — are forcing us to reconsider some evergreen questions: What content is public? What is private? How do we decide?
On June 3, 2022, CLTC brought together a pair of civil society experts from Colombia and Brazil to discuss how these questions are playing out in Latin America. Moderated by Tejas Narechania, Assistant Professor of Law at the UC Berkeley School of Law, the panel featured Alice de Perdigão Lana, Culture and Knowledge Coordinator for InternetLab, an independent research center that aims to foster academic debate around issues involving law and technology, especially internet policy; and Pilar Saenz, Digital Security and Privacy Laboratory Coordinator for Fundación Karisma, a Colombian civil society organization that seeks to respond to the threats and opportunities that “technology for development” poses to the exercise of human rights.
The panel was part of a series of discussions on data scraping that CLTC has convened throughout the first half of 2022. (Recordings and recaps are available from the February panel on “Data Scraping & the Courts: State of Play with the CFAA” and the April panel, “Data Scraping for Research Purposes.”)
“So far in this series, we’ve been talking about data scraping mostly in domestic contexts here in the United States,” Narechania explained in his opening remarks. “[Our panelists] are at civil society organizations in Brazil and Colombia, respectively, looking at how digital technologies affect things like democracy, privacy, and freedom of expression.”
Saenz explained that her work focuses on civil participation, as she studies such issues as state surveillance and technology in the election process. She said that, while her organization does not use data scraping, they have gathered data from government sources, such as contracts and audits, and made it public for transparency and accountability. “We need to refocus on how the technology is going to be used and try to understand better how data is created to give citizens more information,” Saenz said.
Alice de Perdigão Lana, author of the book Women Exposed: Revenge Porn, Gender, and the Marco Civil da Internet, expressed gratitude for the invitation to the panel, noting that voices from outside the U.S. and Europe are too often not included in decision-making related to technology. “When we’re talking about the perspective from the Global South, more often than not, legislation that can have a great impact here is made without hearing from us,” she said.
Lana explained that her research team used data scraping for a project called “Monitor,” which was an “observatory of political violence against female candidates on social networks.” The research was based on analysis of hundreds of thousands of comments directed at candidates on social networks like Twitter, Instagram, and YouTube.
While technology allowed her team to aggregate the data and perform one level of analysis using automated tools, the final analysis of the content was only possible with the contributions of a team of journalists, linguists, and others, de Perdigão Lana explained.
De Perdigão Lana also noted that the Portuguese word for data scraping directly translates as “scratching,” as in “removing a very thin layer, almost insignificant,” and that the term may be misleading. “We as organizations that work with digital rights, sometimes I have to say, it’s not insignificant data that you’re collecting.”
She also noted that many Brazilian citizens are often unaware of the degree to which their data may be taken, particularly as laws like LGPD, Brazil’s General Data Protection Law, may not offer sufficient protection as they may not be properly enforced. “There has not been really a very heated debate because there is a general feeling that the law already covers what needs to be covered,” she said. “The next step is to implement it in practice. And as happens a lot of countries from the Global South, most citizens don’t really understand this. Since I work with data, I’m very worried about my privacy, online and offline as well,” she said. “I talk about that with friends and colleagues, and people don’t really care that their data is being collected or scraped or mistreated in any way, sadly.”
Watch the full discussion above or on YouTube.