Scoring (violin.scoring)

This page details the scoring functions of VIOLIN

Match Score

The Match Score (SM) measures how many new nodes are found in the reading with respect to the model. For an interaction from the reading A → B, where A is the regulator and B is the regulated node, this calculation considers 4 cases which determine the scoring outcome:

  1. Both A and B are in the model

  2. A is in the model, B is not

  3. B is in the model, A is not

  4. Neither A nor B are in the model

Deafult Match Level scores are given for the assumption that the user wants to extend a given model without adding new nodes which may not be useful to the network. Thus, new regulators and new edges between model nodes are considered most important.

Kind Score

The Kind Score (SK) measures the edges of a reading interaction (LEE) with respect to the model (MI). The Kind Score easily identifies the classification of an interaction, as well as searching for paths between nodes in the model when the reading interaction is identified as indirect. Using the same assumption from the Match Level calculation, the Kind Score represents the following scenarios:

Classification

Definition

Corroboration

LEE matches MI

Extension

LEE contains information not found in model

Contradiction

LEE disputes information in MI

Flagged

Must be judged manually

And within each classification, there are further sub-classifications. These subclassifications allow for more detailed scoring, if the user wishes.

Corroborations

Strong Corroboration: LEE matches MI exactly

Weak Corroboration Type 1: LEE matches direction, sign, connection type, and node type, of a model interaction but is missing additional attributes

Weak Corroboration Type 2: an indirect LEE matches direction and sign of direct model interaction with non-contradictory attributes

Weak Corroboration Type 3: an indrect LEE matches the direction and sign of a path in the model with non-contradictory attributes

Extensions

Full Extension: Neither source nor target of the LEE is in the model

Hanging Extension: The target of the LEE is in the model

Internal Extension: Both the source and target of the LEE are in the model, but there is no model interaction between them

Specification: LEE contains more information (attributes) than MI, or shows a direct relationship compared to Model Path

Contradictions

Direction Contradiction: The target and source of the LEE correspond to the source and target of the model interaction, respectively

Sign Contradiction: The regulation sign of the LEE is opposite of the corresponding model interaction (e.g. the LEE shows a positive regulation where the model interaction shows negative)

Attribute Contradiction: One or more of the LEE node attributes differs from that found in the corresponding model interaction

Flagged

Flagged Type 1: Mismatched Direction and non-contradictory Other Attributes with a Direct connection type in the model

Flagged Type 2: An LEE with a corresponding path which has one or more Mismatched Attributes

Flagged Type 3: An LEE which is a self-regulation based on the definition of model element (e.g. LEE has caspase-8 –> caspase-3, but the model considers cas-8 and cas-3 to be the same element)

Evidence Score

The Evidence Score (SE) is a measure of how many times an LEE is found in the machine reading output. In the violin.formatting.evidence_score() function, column names are defined to determine how the function determines duplicates. For example, the Evidence Score can be calculated by comparing all LEE attributes and all machine readings spreadsheet columns. So only an exact match between LEEs will be counted as a duplicate. However, the user can also define fewer attributes, creating a more coarse-grained Evidence Score calculation.

Epistemic Value

In the NLP output, we sometimes receive an Epistemic Value (SB), which is a measure of the believability of an interaction in the LEI. Zero, Low, Moderate, and High believability correspond to numerical scores of 0.0, 0.33, 0.67, and 1.0, respectively.

Total Score

The total score (ST) is calculated by

\[S_T = [S_K + (S_E*S_M)]*S_B\]

Functions

scoring.match_score(x, reading_df, model_df, reading_cols, match_values={'both present': 10, 'neither present': 0.1, 'source present': 1, 'target present': 100})[source]

This function calculates the Match Score for an interaction from the reading

Parameters
  • x (int) – The line of the reading dataframe with the interaction to be scored

  • reading_df (pd.DataFrame) – The reading dataframe

  • model_df (pd.DataFrame) – The model dataframe

  • reading_cols (dict) – Column Header names taken on input

  • match_values (dict) – Dictionary assigning Match Score values Default values found in match_dict

Returns

match – Match Score value

Return type

int

scoring.kind_score(x, model_df, reading_df, graph, reading_cols, kind_values={'att contradiction': 10, 'dir contradiction': 10, 'flagged1': 20, 'flagged2': 20, 'flagged3': 20, 'full extension': 40, 'hanging extension': 40, 'internal extension': 40, 'sign contradiction': 10, 'specification': 30, 'strong corroboration': 2, 'weak corroboration1': 1, 'weak corroboration2': 1, 'weak corroboration3': 1}, attributes=[], mi_cxn='d')[source]

This function calculates the Kind Score for an interaction in the reading

Parameters
  • x (int) – The line of the reading dataframe with the interaction to be scored

  • model_df (pd.DataFrame) – The model dataframe

  • reading_df (pd.DataFrame) – The reading dataframe

  • graph (nx.DiGraph) – directed graph of the model,used when function calls path_finding module

  • reading_cols (dict) – Column Header names taken on input

  • kind_values (dict) – Dictionary assigning Kind Score values Default values found in kind_dict

  • attributes (list) – List of attributes compared between the model and the machine reading output Default is None

  • mi_cxn (str) – What connection type should be assigned to model interactions if not available Accepted values are “d” (direct) or “i” (indirect) Deafult is “d”

Returns

kind – Kind Score score value

Return type

int

scoring.epistemic_value(x, reading_df)[source]

Finds the epistemic value of the LEE (when available)

Parameters
  • x (int) – The line of the reading dataframe with the interaction to be scored

  • reading_df (pd.DataFrame) – The reading dataframe

Returns

e_value – The Epistemic Value; if there is no Epistemic Value available for the reading, default is 1 for all LEEs

Return type

float

scoring.score_reading(reading_df, model_df, graph, reading_cols, kind_values={'att contradiction': 10, 'dir contradiction': 10, 'flagged1': 20, 'flagged2': 20, 'flagged3': 20, 'full extension': 40, 'hanging extension': 40, 'internal extension': 40, 'sign contradiction': 10, 'specification': 30, 'strong corroboration': 2, 'weak corroboration1': 1, 'weak corroboration2': 1, 'weak corroboration3': 1}, match_values={'both present': 10, 'neither present': 0.1, 'source present': 1, 'target present': 100}, attributes=[], mi_cxn='d')[source]

Creates new columns for the Match Score, Kind Score, Epistemic Value, and Total Score. Calls scoring functions and stores the values in the approriate column.

Parameters
  • reading_df (pd.DataFrame) – The reading dataframe

  • model_df (pd.DataFrame) – The model dataframe

  • graph (nx.DiGraph) – directed graph of the model, necessary for calling kind_score module

  • reading_cols (dict) – Column Header names taken upon input

  • kind_values (dict) – Dictionary assigning Kind Score values Default values found in kind_dict

  • match_values (dict) – Dictionary assigning Match Score values Default values found in match_dict

  • attributes (list) – List of attributes compared between the model and the machine reading output Default is None

Returns

scored = reading_df – reading dataframe with added scores

Return type

pd.DataFrame

Dependencies

Python: pandas library

VIOLIN: network and numeric modules.

Defaults

Default Match Score values

28match_dict = {"source present" : 1, 
29                "target present" : 100, 
30                "both present" : 10, 
31                "neither present" : 0.1}

Default Kind Score values

14kind_dict = {"strong corroboration" : 2, 
15                "weak corroboration1" : 1,
16                "weak corroboration2" : 1,
17                "weak corroboration3" : 1,
18                "hanging extension" : 40, 
19                "full extension" : 40, 
20                "internal extension" : 40, 
21                "specification" : 30, 
22                "dir contradiction" : 10,
23                "sign contradiction" : 10,
24                "att contradiction" : 10,
25                "flagged1" : 20,
26                "flagged2" : 20,
27                "flagged3" : 20}

Usage

scoring.score_reading scores the reading output in the following manner:

406    scored_reading_df['Total Score'] = pd.Series()
407    print(reading_df.shape[0])
408    #Calculate scores
409    for x in range(reading_df.shape[0]):
410        scored_reading_df.at[x,'Match Score'] = match_score(x,reading_df,model_df,reading_cols,match_values)
411        scored_reading_df.at[x,'Kind Score'] = kind_score(x,model_df,reading_df,graph,reading_cols,kind_values,attributes,mi_cxn)
412        scored_reading_df.at[x,'Epistemic Value'] = epistemic_value(x,reading_df)
413        scored_reading_df.at[x,'Total Score'] =  ((scored_reading_df.at[x,'Evidence Score']*scored_reading_df.at[x,'Match Score'])+scored_reading_df.at[x,'Kind Score'])*scored_reading_df.at[x,'Epistemic Value']
414