VIOLIN Function References

Input and Output Functions (`violin.in_out`)

This section details the functions which handle the input files and output of VIOLIN.

For more information on the types of accepted inputs, see BioRECIPE.

Functions

violin.in_out.preprocessing_model(model: str) → pandas.core.frame.DataFrame[source]

This function checks whether the model is correct and verifies that all necessary columns are present.

It accepts an executable BioRECIPE model provided in .txt, .csv, .xlsx, or .tsv format. Thefile’s content will be convert into lower case. Additionally, A ‘Listname’ is created as a unique identifier for every element for further indexing.

Parameters: model (str) – A name of file which includes an executable BioRECIPE model.
Returns: new_model – A formatted model dataframe.
Return type: pd.DataFrame

violin.in_out.preprocessing_reading(reading: str, evidence_score_cols: Optional[dict] = None, atts: Optional[list] = None) → pandas.core.frame.DataFrame[source]

This function import the reading file and check if the reading format is correct.

Parameters

reading (str) – A pathname of the machine reading spreadsheet output or interactions set from database, in BioRECIPE format. Accepted file: .txt, .csv, .tsv, .xlsx.
evidence_score_cols (list) – A list of column headings used to identify identical interactions.
atts (list) – A list of additional attributes which are available in interactions set. Default is none.

Returns

new_reading – A formatted reading dataframe, including evidence count and list of PMCIDs.

Return type

pd.dataframe

violin.in_out.output(reading_df: pandas.core.frame.DataFrame, file_name: str, classify_scheme: str = '1', kind_values: Optional[dict] = None) → None[source]

This function outputs the classified interactions. The output filenames are composed with {file_name_prefix}_{category}.csv.

Parameters

reading_df (pd.dataframe) – A classified dataframe of a interactions set.
file_name (str) – A prefix of output filename.
classify_scheme (str) – Scheme approach to classify, available options are ‘1’, ‘2’, and ‘3’.
kind_values (dict) – A dictionary containing the numerical values for the Kind Score classifications. Default values are found in KIND_DICT.

Defaults

Default Reading Columns

                      "Regulated Name", "Regulated Type", "Regulated Subtype", "Regulated HGNC Symbol",
                      "Regulated Database", "Regulated ID", "Regulated Compartment", "Regulated Compartment ID",
                      "Sign", "Connection Type", "Mechanism", "Site",
                      "Cell Line", "Cell Type", "Tissue Type", "Organism"]

BioRECIPE_READING_COL = ["Regulator Name", "Regulator Type", "Regulator Subtype", "Regulator HGNC Symbol",
                         "Regulator Database", "Regulator ID", "Regulator Compartment", "Regulator Compartment ID",

Default Model Columns (From BioRECIPE format)

                "path mismatch" : 19,
                "self-regulation" : 18,
                "flagged4" : 17,
                "flagged5" : 16}

Formatting Functions (`violin.formatting`)

This section details the formatting functions of VIOLIN, used during model and interaction list.

The formatting, as it:

identifies duplicate interactions in the interactions list,
counts the number of times an interaction was found in the interactions list (Evidence Score),
creates a unique identifier for every element based on their name, type, subtype, and compartment ID.

Functions

violin.formatting.evidence_score(reading_df: pandas.core.frame.DataFrame, col_names: list) → pandas.core.frame.DataFrame[source]

This function merges duplicate interactions and calculates evidence score of each interaction.

Parameters

reading_df (pd.DataFrame) – A dataframe of the interaction list with BioRECIPE format.
col_names (list) – A list of column headings used to determine if interactions are identical.

Returns

counted_reading – A new dataframe with the evidence count and PMCID list for each interaction.

Return type

pd.DataFrame

violin.formatting.get_listname(idx: int, model_df: pandas.core.frame.DataFrame) → str[source]

Create the listnames by element attributes. This function generates unique identifiers for elements in the model network using the rules:

listname: {element_name}_{element_type}_{element_subtype}_{compartment_ID}

For the elements have multiple types and subtypes, the identifier only include the first entry.

If any attribute is empty, it is replaced with ‘nan’ in the list name.

These unique identifiers are then used by VIOLIN for further manipulation of the network information.

Parameters

idx (int) – the row index of element in the model file.
model_df (pd.DataFrame) – A dataframe of a model.

Returns

listname – A formatted name for regulator list column.

Return type

str

Network Functions (`violin.network`)

This page details how paths are defined and found in the model in VIOLIN. Because of the compact nature of the BioRECIPES model format, the model must be converted into a node-edge list for use with the NetworkX Python package.

One special feature of VIOLIN is its ability to compare interactions from machine reading output, or interaction set from database, to paths that exist in the model. For two nodes, E1 and Ex, an iIS may exist with E1 regulating Ex. If in the model there is a path of multiple interactions where E1 regulates E2 which regulates E3 etc. to Ex, VIOLIN can identify this, and compare the iIS to this whole path. And indirect iIS may be a path corroboration to the model interaction, or a direct iIS may be a specification, identifying a more direct relationship between 2 nodes than is given in the model. This functionality reduces the number of false extensions.

Functions

violin.network.node_edge_list(model_df: pandas.core.frame.DataFrame) → networkx.classes.digraph.DiGraph[source]

This function converts a model from the BioRECIPE format into a node-edge list for use with NetworkX. The converted network is a directed graph.

Parameters: model_df (pd.DataFrame) – A model dataframe, must be in BioRECIPE format.
Returns: node_edge_list – A directed graph representation of the model.
Return type: nx.DiGraph

violin.network.path_finding(regulator: str, regulated: str, sign: str, model_df: pandas.core.frame.DataFrame, graph: networkx.classes.digraph.DiGraph, kind_values: dict, reading_cxn_type: str, reading_atts: dict, attributes: list, scheme='1') → Union[str, int][source]

This function searches for a path in the model, where the source and target are the identifier of the matched elements, and calculates the kind score based on the results. The Dijkstra’s algorithm is used to find the shortest path. The path is identified as a negatively regulation between source and target, if the sum of edge weights is an odd number, and vice versa.

Parameters

regulator (str) – An identifier of the source node.
regulated (str) – An identifier of the target node.
sign (str) – A sign of interaction from the Interaction Set (IS). available options: [‘positive’, ‘negative’].
model_df (pd.DataFrame) – Model dataframe
graph (nx.DiGraph) – Model edgelist to create network for finding paths between elements.
kind_values (dict) – Dictionary containing the numerical values for the Kind Score classifications.
reading_cxn_type (str) – Connection Type of interaction from reading - ‘i’ for indirect, ‘d’ for direct.
reading_atts (dict) – attributes from interaction, where keys are attributes names and values are attributes values.
attributes (list) – attributes list for reading file.
scheme (str) – The scheme of classification, i.e. ‘1’, ‘2’, or ‘3’.

Returns

kind – Kind Score value for the interaction.

Return type

int

Numeric Functions (`violin.numeric`)

This section describes the numeric operators of VIOLIN.

searching for an element in the machine reading output or the interactions set from databases,
comparing attributes, identifying whether a given attribute
- matches exactly attribute in a corresponding model interaction,
- is missing where a model interaction attribute is present,
- is present where a model interaction attribute is missing,
- mismatch from an attribute in a corresponding model interaction.

Both functions return numerical values to represent the outcome of the function.

Functions

violin.numeric.get_attributes(A_idx: int, B_idx: int, sign: str, model_df: pandas.core.frame.DataFrame, attrs: list, path: bool = False) → dict[source]

The function gets the attributes of the interaction in model, available attributes includes [Regulator Compartment, Regulator Compartment ID, Regulated Compartment, Regulated Compartment ID, Mechansim, Site, Cell Line, Cell Type, Tissue Type, Organism]. If Regulator Compartment is selected, Regulator Compartment ID will also be selected.

Parameters

A_idx (int) – A row index of element A in the input model dataframe.
B_idx (int) – A row index of element B in the input model dataframe.
sign (str) – A sign of the interaction, available options: ‘positive’ or ‘negative’.
model_df (pd.DataFrame) – A DataFrame of a model with BioRECIPE format.
attrs (list) – An attributes list for interactions file.
path (bool) – An indicator if it is path interaction. Attributes will be empty if only path is found in model.

Returns

model_atts – An dict of attributes for a model interaction.

Return type

dict

violin.numeric.find_element(search_type: str, element_name: str, element_type: str, model_df: pandas.core.frame.DataFrame, id_db: Optional[str] = None) → Union[List, int][source]

This function finds the correct indices of an element within the model. Because elements can exist as multiple types (protein, RNA, gene, etc.), this function checks the element name/ID along with the element type. Function may return a list, if a given element of a specific type exists with varying attributes (such as different locations).

Parameters

search_type (str) – An identifier of the element, available options are ‘hgnc’, ‘name’, and ‘id’.
element_name (str) – A name (or ID) of the element being searched for.
element_type (str) – A type of element (‘protein’, ‘protein family’, etc.)
model_df (pd.DataFrame) – A model dataframe within BioRECIPE format.
id_db (str) – A database name for provided identifier.

Returns

location – All row indices of the model spreadsheet in which the element is found (returns -1 if not found).

Return type

list|int

violin.numeric.compare(model_atts: dict, reading_atts: dict) → int[source]

Compares a list of model attributes to the corresponding interaction attributes, returns numeric value

Attributes are the same (strong corroboration): 0
Some or all LEE attributes are missing (weak corroboration): 1
Some or all of the model attributes are missing (specification): 2
One or more model attribute differs from the LEE attributes (contradiction): 3

Parameters

model_atts (dict) – A dictionary of attributes for a model interaction.
reading_atts (dict) – A dictionary of attributes for an event from a literactions list.

Returns

value – The numerical representation of comparison outcome.

Return type

int

Scoring (`violin.scoring`)

This part details the scoring functions of VIOLIN

Match Score

The Match Score (S_M) measures how many new nodes are found in the interactions set with respect to the model. For an interaction in the Interactions Set (iIS) A → B, where A is the regulator and B is the regulated node, this calculation considers 4 cases which determine the scoring outcome:

Both A and B are in the model
A is in the model, B is not
B is in the model, A is not
Neither A nor B are in the model

Default Match Level scores are given for the assumption that the user wants to extend a given model without adding new nodes which may not be useful to the network. Thus, new regulators and new edges between model nodes are considered most important.

Kind Score

The Kind Score (S_K) measures the edges of an iIS with respect to the model interaction. The Kind Score easily identifies the classification of an interaction, as well as searching for paths between nodes in the model when the iIS is identified as indirect. Using the same assumption from the Match Level calculation, the Kind Score represents the following scenarios:

Classification	Definition
Corroboration	iIS matches model interaction
Extension	iIS contains information not found in model
Contradiction	iIS disputes information in MI
Flagged	Must be judged manually

And within each classification, there are further sub-classifications. These subclassifications allow for more detailed scoring, if the user wishes.

Corroborations

Strong Corroboration: iIS matches MI exactly

Weak Corroboration Type 1: iIS matches direction, sign, connection type, and node type, of a model interaction but is missing additional attributes

Weak Corroboration Type 2: an indirect iIS matches direction and sign of direct model interaction with non-contradictory attributes

Weak Corroboration Type 3: an indrect iIS matches the direction and sign of a path in the model with non-contradictory attributes

Extensions

Full Extension: Neither source nor target of the iIS is in the model

Hanging Extension: The target of the iIS is in the model

Internal Extension: Both the source and target of the iIS are in the model, but there is no model interaction between them

Specification: iIS contains more information (attributes) than MI, or shows a direct relationship compared to Model Path

Contradictions

Direction Contradiction: The target and source of the iIS correspond to the source and target of the model interaction, respectively

Sign Contradiction: The regulation sign of the iIS is opposite of the corresponding model interaction (e.g. the iIS shows a positive regulation where the model interaction shows negative)

Attribute Contradiction: One or more of the iIS node attributes differs from that found in the corresponding model interaction

Flagged

Flagged Type 1: Mismatched Direction and non-contradictory Other Attributes with a Direct connection type in the model

Flagged Type 2: An iIS with a corresponding path which has one or more Mismatched Attributes

Flagged Type 3: An iIS which is a self-regulation based on the definition of model element (e.g. iIS has caspase-8 –> caspase-3, but the model considers cas-8 and cas-3 to be the same element)

Evidence Score

The Evidence Score (S_E) is a measure of how many times an iIS is found. In the violin.formatting.evidence_score() function, column names are defined to determine how the function determines duplicates. For example, the Evidence Score can be calculated by comparing all iIS attributes and all the columns of the interactions set. So only an exact match between iISs will be counted as a duplicate. However, the user can also define fewer attributes, creating a more coarse-grained Evidence Score calculation.

Epistemic Value

In the NLP output, we sometimes receive an Epistemic Value (S_B), which is a measure of the believability of an iIS. Zero, Low, Moderate, and High believability correspond to numerical scores of 0.0, 0.33, 0.67, and 1.0, respectively.

Total Score

The total score (S_T) is calculated by

\[S_T = [S_K + (S_E*S_M)]*S_B\]

Functions

violin.scoring.match_score(x: int, reading_df: pandas.core.frame.DataFrame, model_df: pandas.core.frame.DataFrame, match_values: Optional[dict] = None) → int[source]

This function calculates the Match Score for an interaction from the reading.

Parameters

x (int) – A row index of the dataframe of Interaction set (IS) to be scored
reading_df (pd.DataFrame) – The reading dataframe
model_df (pd.DataFrame) – The model dataframe
match_values (dict) – Dictionary assigning Match Score values Default values found in MATCH_DICT

Returns

match – Match Score value

Return type

int

violin.scoring.kind_score(x: int, model_df: pandas.core.frame.DataFrame, reading_df: pandas.core.frame.DataFrame, graph: networkx.classes.digraph.DiGraph, counter: dict, kind_values: Optional[dict] = None, attributes: Optional[list] = None, classify_scheme: str = '1', mi_cxn: str = 'd') → int[source]

This function calculates the Kind Score for an interaction in the Interactions Set (iIS). The kind score will be used to represent the subcategories. For further details, please find out in: https://www.biorxiv.org/content/10.1101/2024.07.21.604448v1.

Parameters

x (int) – The row index for an iIS.
model_df (pd.DataFrame) – The model dataframe
reading_df (pd.DataFrame) – The reading dataframe.
graph (nx.DiGraph) – A directed graph of the model,used when function calls path_finding module.
counter (dict) – A dictionary to record the interactions that are identified as corroborated or contradicted interaction in model. Default value is None.
kind_values (dict) – Dictionary assigning Kind Score values. Default values found in KIND_DICT_A and KIND_DICT_B.
attributes (list) – A list of attributes compared between the model and the machine reading output. Default is None.
classify_scheme (str) – The scheme of the classification (‘1’, ‘2’, and ‘3’). Default is ‘1’.
mi_cxn (str) – What connection type should be assigned to model interactions if not available. Accepted values are “d” (direct) or “i” (indirect). Deafult is “d”.

Returns

kind – Kind Score, score value.

Return type

int

violin.scoring.epistemic_value(x: int, reading_df: pandas.core.frame.DataFrame) → float[source]

Finds the epistemic value of the interaction in Interaction Set (IS) (when available).

Parameters

x (int) – The row index for an iIS.
reading_df (pd.DataFrame) – An IS dataframe.

Returns

e_value – The Epistemic Value; if there is no Epistemic Value available for the reading, default is 1 for all interactions in IS.

Return type

float

violin.scoring.score_reading(reading_df: pandas.core.frame.DataFrame, model_df: pandas.core.frame.DataFrame, graph: networkx.classes.digraph.DiGraph, counter: Optional[dict] = None, kind_values: Optional[dict] = None, match_values: Optional[dict] = None, attributes: list = [], classify_scheme: str = '1', mi_cxn: str = 'd') → pandas.core.frame.DataFrame[source]

This function creates new columns for the Match Score, Kind Score, Epistemic Value, and Total Score. it calls scoring functions and stores the values in the approriate column.

Parameters

reading_df (pd.DataFrame) – The reading dataframe.
model_df (pd.DataFrame) – The model dataframe.
graph (nx.DiGraph) – directed graph of the model, necessary for calling kind_score module.
counter (dict) – A dictionary for counting the corrobrated and contradicted interaction. defulat value is None and ignore the counting step.
kind_values (dict) – Dictionary assigning Kind Score values. Default values found in KIND_DICT_A and KIND_DICT_B.
match_values (dict) – Dictionary assigning Match Score values. Default values found in MATCH_DICT.
attributes (list) – List of attributes compared between the model and the machine reading output. Default is None.
classify_scheme (str) – The scheme of the classification. Default value is ‘1’.

Returns

scored = reading_df – reading dataframe with added scores.

Return type

pd.DataFrame

Visualization (`violin.visualize_violin`)

VIOLIN’s visualization function creates a visual summary of the VIOLIN output, incuding total score, evidence score, and match score distributions.

The visualization function includes a filtering option, which can help the user make choices on how to use the VIOLIN output. Visualization can be filtered by three possible metrics:

“%x” : Returns the top X% of iISs, by Total Score
“Se>y” : Returns all iISs with an Evidence Score greater than Y
“St>z” : Returns all iISs with a Total Score grater than Z

When visualizing the total output, this function shows the score distributions by classification, as well as the classification distribution
When visualizing output of a single classification, the classification distribution is replaced by the number of iISs given that classification
When subcategories are identified in the Kind Score definition, additional plots of subcategory distribution are included

Class

class violin.visualize_violin.ViolinPlot(file_name: str, filter_opt: str = '100%', match_values: Optional[dict] = None, kind_values: Optional[dict] = None, classify_scheme: str = '1')[source]

This creates figures of the VIOLIN output: evidence score, match score, and total score, and classification breakdown

Parameters

match_values (dict) – Dictionary assigning Match Score Values.
kind_values (dict) – Dictionary assigning Kind Score values.
file_name (string) – VIOLIN output to be visualized. Can be specific classification, or choosing ‘TotalOutput’ file will visualize all VIOLIN output.
filter_opt (str) – How much VIOLIN output should be visualized. Can be filtered by top % of total score, evidence score (Se) threshold, or total score (St) threshold Accepted options are ‘X%’,’Se>Y’, or ‘St>Z’, where X, Y, and Z, are values. Default is ‘100%’ (Total Output).

get_category_summary(category: str, save_name: str = '', save=False) → None[source]: Plot the score (evidence, match, total) for specified categories.

get_pie_plots(out_file: str = '', save=True, show=False) → None[source]: This creates figures of the VIOLIN output: the classification distribution shown in pie charts

get_summary_plots(save=False, merge=True) → None[source]: A summary plot composed with category distribution, evidence score, match score, and total score

VIOLIN Function References

Input and Output Functions (violin.in_out)

Functions

Defaults

Formatting Functions (violin.formatting)

Functions

Network Functions (violin.network)

Functions

Numeric Functions (violin.numeric)

Functions

Scoring (violin.scoring)

Match Score

Kind Score

Corroborations

Extensions

Contradictions

Flagged

Evidence Score

Epistemic Value

Total Score

Functions

Visualization (violin.visualize_violin)

Class

Input and Output Functions (`violin.in_out`)

Formatting Functions (`violin.formatting`)

Network Functions (`violin.network`)

Numeric Functions (`violin.numeric`)

Scoring (`violin.scoring`)

Visualization (`violin.visualize_violin`)