Responsible AI relies on Data Literacy

June 08, 2018

Elena Simperl is a professor of computer science at the University of Southampton and a member of its Web Science Institute. She will attend SEMANTiCS 2018 and we are looking forward to her keynote speech about her professional experiences with the Semantic Web. In this interview she talks about her role and experiences as an evangelist of linked and open data at the cross-section of academia and business, the importance of data literacy and the benefits of diversity.

You are a distinguished professor in computer science and an important protagonist in driving data driven business at the intersection of academia and the start-up scene. How does a successful cooperation between universities, companies and funding look like?

Impact is very important. When you do fundamental research you already have to think about that delivery pipeline which includes not just you and your team but colleagues from other universities or other types of stakeholders. Southampton has an impressive track record in working with a variety of stakeholders, especially when it comes to open and linked data.  Before joining them, I had the opportunity to work with the Open Data Institute in London and I have set up the Open Data Incubator program which was co-funded by Horizon 2020. Our goals was to encourage small- and medium businesses to use open data, so we provided incentives and support for that. For me as an academic it was very rewarding to see some of the things I have been doing and the data I know about being used for creating successful businesses. I learned a lot by communicating and engaging with the startups and getting a sense of the new scenarios in which they use open data and innovate with data in general. This has inspired some of the research we are doing. Such interfaces can work well. In our case, we have the instruments and resources to bootstrap these interactions and this is the true leverage.

Lots of data are generated, but companies are still at the beginning of making use of it. You evangelize Open Data, which means making datasets publicly available. What kind of business models and ecosystem can be built upon Open Data?

Lots of the businesses, we are working with, are aiming for a broad impact. It’s not just about profit. They care about people and the planet. Interestingly, their business models are not really that different from any other data  or digital startup. Few of them, just produce a data set. They rather offer advanced analytics and data services on top of open data resources. Sometimes they use open data alongside other data sets to create a product in a vertical, whether this is precision farming, sports, wellbeing, healthcare or any other sector. Open data is a resource, which can be integrated and mixed up with other data sets to create a product.

Due to recent events such as the data breach at Facebook, or the changes in the legal framework due to GDPR, awareness about data privacy is growing. What changes do you expect in online user behaviour and will it have an impact on business?

I think it will change the business models of the companies that have been very profitable over the last decade by selling or exploiting data of their users. I am not sure if this is going to be because of the feedback they get from their customers - the users - or whether this is a consequence of public opinion, media coverage, and regulations like  GDPR. I appreciate that there are more and more discussions around data privacy and the importance of it. But at the same time it is contradictory to see how people act when they talk about privacy and how little aware they are about what happens to their data. We have to wait and see whether citizens really become more cautious about whom they give their data to and in which ways. A positive side effect of this discussion will be more opportunities for users to manage their personal data if they are indeed interested. We are going to see a rise in the personal data economy where interested users will be more empowered than ever before. I don’t know though whether this is going to be mainstream, or something that will be pushed through education, incentivizes, or by punishing corporations that do not respect these rights. This is just the beginning of a journey.

You specialize, amongst other technology fields, in semantic technologies. How do these technologies complement Artificial Intelligence, Machine Learning and Data Science?

All these technologies complement each other. Semantics is as much part of AI as machine learning. Semantic technologies have been part of AI since the very beginning. One of the reasons AI has not been so successful so far was because there was a lot of investment and effort put in trying to capture the world in very complex knowledge systems. It was impossible with the technology and the systems we had in the the 60s and 70s though. Now we face a completely different situation: Everyone has their devices and access to the web. There is the Internet of Things. We are living in a world of networks and it is much easier to capture the data. You can work with really powerful knowledge based systems that would not just learn without understanding the results but provide the user an interpretation of what is learned, and use knowledge that they have about their surroundings to enrich the results of machine learning.

The relationship with data science is slightly different in the sense that I see semantic web primarily as focusing on particular types of data, mostly graph shaped data. There is a focus on data integration but in the same time there is a lot of work going into machine learning from knowledge graphs. That shows very well how data analytics and knowledge representation, knowledge technologies can complement each other.

Which technology trends that are maybe not so intensively discussed (yet) do you consider particularly important?

The “fairness of algorithms” is getting some attention recently. People are now more aware of the biases in the way algorithms work.The quality of data that is used to train an algorithm plays a really important role in how those algorithms work and make decisions. Currently, there is not enough data and evidence to come up with a solution for that. In this area we really need empirical evidence, studies and multidisciplinary theories to make algorithms more transparent and fair. At the same time, there are many other organizations that don’t even have enough data available to make substantial progress with machine learning. The relationship between training data used in machine learning algorithms and the performance of these algorithms is crucial.

The work environment is becoming increasingly more automated and supported by AI-driven applications. Do you consider this a threat or an opportunity? How do you think will we collaborate in the near future?

AI is an opportunity. If we have learned anything from the discussions around monopolies on the internet, data silos, breaches of privacy, and the ways that these big platforms can be misused it is that it is never too early to think about potential threats. AI is already in the marketplace in many areas. In finance, algorithms are making many decisions for a decade probably. We really need to think carefully and study the problems and implications technologies can have in the marketplace. Research has to provide evidence and recommendations for what can be done.

You are one of the keynote speakers of this year’s SEMANTiCS conference, where a mixed audience of tech and business people is expected. Why is technology and data literacy important for non-experts?  

In a professional context we see an increase in automation in areas ranging from anything related to office productivity to more expert kind of jobs. Algorithms are quite good at going through large amounts of data and perhaps taking decisions and processing information. People will have to work alongside algorithms, and for that they will need to have some basic understanding of data, data science, and AI. Even if you move away from the professional context – every citizen is affected by these technologies. We have them in our  homes. I want to know which data is collected by my IoT-Devices and which data is shared with whom. Having at least a very basic understanding of how this works is very important.

You are successful in academia and the tech industry. Both domains are still male dominated. Which challenges did you face and what do you consider necessary to reach gender equality in the work environment?

I don’t think I have faced specific challenges myself. Maybe I was fortunate, maybe it was a combination of circumstances and the skills I had. However, it’s true that I was often the only female in a team of male colleagues. This resulted in discussions being run in a particular way: There were implicit or explicit biases that lead to less effective results that we could have potentially reached.

The fundamental problem starts very early. In many primary, secondary, or higher schools there are limited opportunities for girls and other underrepresented groups to learn computer science. That needs to change. We also need to change our way of communicating with these groups from early on. You detect implicit biases even in textbooks about what gender is represented as being exemplary for a specific area. We need to be much more careful with these messages that we implicitly send to people.

In general, it’s a no-brainer that a lack of diversity leads to less creativity and productivity.


Discuss the potential of artificial intelligence and semantic technologies with Elena at SEMANTiCS 2018. Register now!
 

About SEMANTiCS

The annual SEMANTiCS conference is the meeting place for professionals who make semantic computing work, and understand its benefits and know its limitations. Every year, SEMANTiCS attracts information managers, IT-architects, software engineers, and researchers, from organisations ranging from NPOs, universities, public administrations to the largest companies in the world.

http://www.semantics.cc