Cleareye Main Logo 5

Gauging Reputation Risk using Knowledge Graphs

According to the world economic forum more than 25% of a company’s market share is directly linked to its reputation. Reputation matters to all organisations, but very few have a plan to address reputation risks. There is no industry standard framework for Reputation risk management and organisations has to reinvent the wheel every time to develop that. We at Cleareye have developed a framework that can calculate the risk exposure for any organisation using events picked up from news, social feeds etc.

Cleareye is a AI startup that enables banks to quickly launch tailored products in weeks while delivering unforgettable customer experiences. Please visit our website to know more about us.

What is Reputation Risk: Reputation risk is the potential loss to financial capital, social capital and/or market share resulting from damage to a firm’s reputation. A reputation risk event can be anything from a top executive involved in a bribery case to a cyber attack with one of the strategic business partners

The reputation of an organisation sits in the collective thoughts and feelings of a broad set of stakeholders. Each organisation develops their own Risk Framework that monitors its environment and develop automated tracking.

The Cleareye Framework helps to provides a solution to automate this hectic process. The Tool developed extracts the news, classifies them, extracts features and loads the information into a Knowledge Graph. In the graph connections are formed between different risk events and using our Proprietary algorithm we assess the relative impact of risk event.

The article is intended to give you an idea about our approach, and detail out the steps in the process. For this Proof of Concept we picked up three organizations Cognizant, Elli Mae and Fico and calculated their risk exposure. The content can be divided into following sections.

  1. Data Extraction/Classification
  2. Feature Generation
  3. Data loading into Graph
  4. Risk Scoring Model
  5. Visualising Results

The high-level architecture can be seen in the below diagram.

Image for post
High-level architecture

1. Data Extraction/Classification

Initially we identified certain news websites and added them into our configuration file. Then we used Python newspaper library to scrape news from these sites. This didn’t help us with adequate data so we further went ahead and used the Google API to fetch more data from the google news.

We collected such data for all the 3 organizations from more than 350+ news sites. This was adequate for our implementation. The next challenge was data cleaning. We filtered out irrelevant content from the extraction output(as google news had all sorts of news) and then removed special characters and extra spaces. We then performed statistical analysis to check outliers and detect noises. Once the data was cleaned we labelled them for classification. Figure 1 shows the data extraction and classification pipeline.

Image for post
Figure1: Data extraction and classification steps

Reputation risk can be further classified into various risk sub category types. Below diagram will give you an overall idea of the different types(Figure2).

Image for post
Figure 2: Risk subcategory types(Image source:

Our first step in data classification was classifying the news manually to Risk Events and General news. Risk events are any news that falls into the definition of any risk subcategory type(Figure 2) and General news are events that are associated with the organisation but doesn’t qualify as a risk event(will be explained later in this article). The need for this segregation is to quantify risk as an impact of General news associated with the organisation in the market. These Risk events were further classified into the below subcategory types.

  1. Employment Risk
  2. Staff Risk
  3. Brand & Media Risk
  4. Financial Risk

The sequence of steps and the collected data stats is represented in Figure1.

2. Feature Generation

In order to enrich the graph with information we need to extract features from the labelled data. After a careful study of data and methods we decided to generate sentiment score, named entities and keywords as features.

For Sentiment score we used Fin BERT model and performed sentiment analysis. Fin BERT is a BERT model which is pretrained on financial data. We performed fine tuning using our training data for better results.

Named entities were extracted using spacy NER pretrained model. For this project we only considered the person, location and geo-political entities

Spacy NER could capture only few entities such as geopolitical entities. This is not sufficient to form strong connections in the graph. Hence we extract important keywords from the news data. For this we use BERT once again. The advantage of using BERT is that, unlike methods like RAKE which use statistical properties for entity extraction BERT is based on semantic similarity.

Once the features are generated it will be loaded to the Neo4j graph along with the respective news item.

Loading data into Neo4j Graph

A sample structure of the information loaded to Neo4j





3. Risk Scoring Model

This is the important part of the activity where you actually have to quantify the risk event based on its relative impact. After a lot of analysis and reading we identified the below factors which might impact risk scoring.

  • Sentiment score of the risk event: The sentiment score of the risk event gives the general emotion of the news, whether it is positive, neutral or negative.
  • Impact score based on “relatedness” of the risk event with other news in the graph. Based on the relatedness distance there can be direct and indirect impact.
  • Risk subcategory precedence: There is a predefined precedence level set based on what type of risk is important for an organization. Based on that we have set priority levels for all the 4 risk subcategory types.

To explain further on the relatedness and impact, consider a subgraph which has a risk event connected to the news through entities.

Image for post
Figure 3: Representing direct impact

Here the risk event(orange colour node) is a cybersecurity event and comes under “Brand and Media Risk”. It is connected to 4 other news through the keyword “ransomware”. Now if you look at the news items(green colored) they are some of the impacts due to the risk event. This is a direct impact because the Riskevent is directly liked to these news through “ransomware” keyword. The idea here is the sentiment of the risk event doesn’t give much insights into the impact of the event because “ransomware” doesn’t have a semantic meaning associated. The connected news helps us to get more insights into the ransomware impact and hence that creates an impact in the risk score

Let’s take another example to highlight the indirect relatedness. Consider the below diagram.

Image for post
Figure 4: Representing indirect impact

Using this sub-graph we have to identify the impact of risk event(node 1). If you see node 1 and node 3 are indirectly connected through entities “Brian Humphries” and “Vodafone”. To put it in perspective in order to analyse the impact of risk event we need to know more information about the entity in the news. In this case it is “Brian Humphries” and through graph connections the algorithm identifies that he was earlier working with Vodafone and relates to those news. This is an example of indirect impact.

It is often possible that in a graph with lot of information every node can be connected to other nodes through series of multiple entities. All of them may not be a relevant impact. During our analysis we found out that connections with larger path length were mostly invalid and to avoid that we added penalized our impact score with path length. To identify the impact due to each node we used the Shortest Path graph algorithm

Our risk scoring formula can be represented diagrammatically like this(see image below)

Image for post
Figure 5: Risk scoring formula

5. Visualising Results

After calculating the risk exposure score for all the risk events we analysed the risk outlook for each vendor. We scored the risk exposure on a range of 0–20 and visualised across the partner networks. We also carried out our analysis vendor wise and risk sub category type.

We used streamlit application to present and share the results. You can take a look at out analysis using the below link

Reputation Risk Exposure

About the Author
Sundararaman Parameswaran is a Machine Learning Engineer at Sundararaman is interested in Natural Language processing and computer vision.

References, ,

Post Tags :

Share :

Leave a Reply

Your email address will not be published. Required fields are marked *

Schedule A Demo