Scoring (violin.scoring)
This page details the scoring functions of VIOLIN
Match Score
The Match Score (SM) measures how many new nodes are found in the reading with respect to the model. For an interaction from the reading A → B, where A is the regulator and B is the regulated node, this calculation considers 4 cases which determine the scoring outcome:
Both A and B are in the model
A is in the model, B is not
B is in the model, A is not
Neither A nor B are in the model
Deafult Match Level scores are given for the assumption that the user wants to extend a given model without adding new nodes which may not be useful to the network. Thus, new regulators and new edges between model nodes are considered most important.
Kind Score
The Kind Score (SK) measures the edges of a reading interaction (LEE) with respect to the model (MI). The Kind Score easily identifies the classification of an interaction, as well as searching for paths between nodes in the model when the reading interaction is identified as indirect. Using the same assumption from the Match Level calculation, the Kind Score represents the following scenarios:
Classification |
Definition |
|---|---|
Corroboration |
LEE matches MI |
Extension |
LEE contains information not found in model |
Contradiction |
LEE disputes information in MI |
Flagged |
Must be judged manually |
And within each classification, there are further sub-classifications. These subclassifications allow for more detailed scoring, if the user wishes.
Corroborations
Strong Corroboration: LEE matches MI exactly
Weak Corroboration Type 1: LEE matches direction, sign, connection type, and node type, of a model interaction but is missing additional attributes
Weak Corroboration Type 2: an indirect LEE matches direction and sign of direct model interaction with non-contradictory attributes
Weak Corroboration Type 3: an indrect LEE matches the direction and sign of a path in the model with non-contradictory attributes
Extensions
Full Extension: Neither source nor target of the LEE is in the model
Hanging Extension: The target of the LEE is in the model
Internal Extension: Both the source and target of the LEE are in the model, but there is no model interaction between them
Specification: LEE contains more information (attributes) than MI, or shows a direct relationship compared to Model Path
Contradictions
Direction Contradiction: The target and source of the LEE correspond to the source and target of the model interaction, respectively
Sign Contradiction: The regulation sign of the LEE is opposite of the corresponding model interaction (e.g. the LEE shows a positive regulation where the model interaction shows negative)
Attribute Contradiction: One or more of the LEE node attributes differs from that found in the corresponding model interaction
Flagged
Flagged Type 1: Mismatched Direction and non-contradictory Other Attributes with a Direct connection type in the model
Flagged Type 2: An LEE with a corresponding path which has one or more Mismatched Attributes
Flagged Type 3: An LEE which is a self-regulation based on the definition of model element (e.g. LEE has caspase-8 –> caspase-3, but the model considers cas-8 and cas-3 to be the same element)
Evidence Score
The Evidence Score (SE) is a measure of how many times an LEE is found in the machine reading output. In the violin.formatting.evidence_score() function, column names
are defined to determine how the function determines duplicates. For example, the Evidence Score can be calculated by comparing all LEE attributes and all machine readings spreadsheet columns.
So only an exact match between LEEs will be counted as a duplicate. However, the user can also define fewer attributes, creating a more coarse-grained Evidence Score calculation.
Epistemic Value
In the NLP output, we sometimes receive an Epistemic Value (SB), which is a measure of the believability of an interaction in the LEI. Zero, Low, Moderate, and High believability correspond to numerical scores of 0.0, 0.33, 0.67, and 1.0, respectively.
Total Score
The total score (ST) is calculated by
Functions
- violin.scoring.match_score(x: int, reading_df: pandas.core.frame.DataFrame, model_df: pandas.core.frame.DataFrame, match_values: Optional[dict] = None) int[source]
This function calculates the Match Score for an interaction from the reading.
- Parameters
x (int) – A row index of the dataframe of Interaction set (IS) to be scored
reading_df (pd.DataFrame) – The reading dataframe
model_df (pd.DataFrame) – The model dataframe
match_values (dict) – Dictionary assigning Match Score values Default values found in MATCH_DICT
- Returns
match – Match Score value
- Return type
int
- violin.scoring.kind_score(x: int, model_df: pandas.core.frame.DataFrame, reading_df: pandas.core.frame.DataFrame, graph: networkx.classes.digraph.DiGraph, counter: dict, kind_values: Optional[dict] = None, attributes: Optional[list] = None, classify_scheme: str = '1', mi_cxn: str = 'd') int[source]
This function calculates the Kind Score for an interaction in the Interactions Set (iIS). The kind score will be used to represent the subcategories. For further details, please find out in: https://www.biorxiv.org/content/10.1101/2024.07.21.604448v1.
- Parameters
x (int) – The row index for an iIS.
model_df (pd.DataFrame) – The model dataframe
reading_df (pd.DataFrame) – The reading dataframe.
graph (nx.DiGraph) – A directed graph of the model,used when function calls path_finding module.
counter (dict) – A dictionary to record the interactions that are identified as corroborated or contradicted interaction in model. Default value is None.
kind_values (dict) – Dictionary assigning Kind Score values. Default values found in KIND_DICT_A and KIND_DICT_B.
attributes (list) – A list of attributes compared between the model and the machine reading output. Default is None.
classify_scheme (str) – The scheme of the classification (‘1’, ‘2’, and ‘3’). Default is ‘1’.
mi_cxn (str) – What connection type should be assigned to model interactions if not available. Accepted values are “d” (direct) or “i” (indirect). Deafult is “d”.
- Returns
kind – Kind Score, score value.
- Return type
int
- violin.scoring.epistemic_value(x: int, reading_df: pandas.core.frame.DataFrame) float[source]
Finds the epistemic value of the interaction in Interaction Set (IS) (when available).
- Parameters
x (int) – The row index for an iIS.
reading_df (pd.DataFrame) – An IS dataframe.
- Returns
e_value – The Epistemic Value; if there is no Epistemic Value available for the reading, default is 1 for all interactions in IS.
- Return type
float
- violin.scoring.score_reading(reading_df: pandas.core.frame.DataFrame, model_df: pandas.core.frame.DataFrame, graph: networkx.classes.digraph.DiGraph, counter: Optional[dict] = None, kind_values: Optional[dict] = None, match_values: Optional[dict] = None, attributes: list = [], classify_scheme: str = '1', mi_cxn: str = 'd') pandas.core.frame.DataFrame[source]
This function creates new columns for the Match Score, Kind Score, Epistemic Value, and Total Score. it calls scoring functions and stores the values in the approriate column.
- Parameters
reading_df (pd.DataFrame) – The reading dataframe.
model_df (pd.DataFrame) – The model dataframe.
graph (nx.DiGraph) – directed graph of the model, necessary for calling kind_score module.
counter (dict) – A dictionary for counting the corrobrated and contradicted interaction. defulat value is None and ignore the counting step.
kind_values (dict) – Dictionary assigning Kind Score values. Default values found in KIND_DICT_A and KIND_DICT_B.
match_values (dict) – Dictionary assigning Match Score values. Default values found in MATCH_DICT.
attributes (list) – List of attributes compared between the model and the machine reading output. Default is None.
classify_scheme (str) – The scheme of the classification. Default value is ‘1’.
- Returns
scored = reading_df – reading dataframe with added scores.
- Return type
pd.DataFrame
Dependencies
Python: pandas library
VIOLIN: network and numeric modules.
Defaults
Default Match Score values
28# "specification" : 30,
29# "dir contradiction" : 10,
30# "sign contradiction" : 10,
31# "att contradiction" : 10,
Default Kind Score values
14import logging
15
16logging.basicConfig(level=logging.INFO)
17logger = logging.getLogger(__name__)
18
19
20# Default kind score dict - categories
21# KIND_DICT = {"strong corroboration" : 2,
22# "empty attribute" : 1,
23# "indirect interaction" : 1,
24# "path corroboration" : 1,
25# "hanging extension" : 40,
26# "full extension" : 40,
27# "internal extension" : 40,
Usage
scoring.score_reading scores the reading output in the following manner:
406 kinds.append(kind_values['strong corroboration'])
407 # Weak corroboration - the iIS presents less information than the model interaction
408 elif compare_atts == 1:
409 kinds.append(kind_values['empty attribute'])
410 # Specification - the iIS presents new information
411 elif compare_atts == 2:
412 kinds.append(kind_values['specification'])
413 # Contradiction - the iIS presents information that disputes the model interaction
414 elif compare_atts == 3: