Formatting (violin.formatting)

This page details the formatting functions of VIOLIN, used during model and reading input.

The formatting step is important, as it:

  • identifies duplicate interactions in the reading output,

  • counts the number of times an interaction was found in the reading (Evidence Score),

  • converts the variable representation of the model regulators into the common names

The formatting functions are also responsible for inputting models and machine reading output which are not in the BioRECIPES or REACH format (respectively).

Functions

violin.formatting.evidence_score(reading_df: pandas.core.frame.DataFrame, col_names: list) pandas.core.frame.DataFrame[source]

This function merges duplicate interactions and calculates evidence score of each interaction.

Parameters
  • reading_df (pd.DataFrame) – A dataframe of the interaction list with BioRECIPE format.

  • col_names (list) – A list of column headings used to determine if interactions are identical.

Returns

counted_reading – A new dataframe with the evidence count and PMCID list for each interaction.

Return type

pd.DataFrame

violin.formatting.add_regulator_names_id(model_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]

This function converts the model regulator lists from ‘variables’ to the common element names and database identifiers.

Parameters

model_df (pd.DataFrame) – The model dataframe (in BioRECIPE format).

Returns

model_df – A new dataframe with added columns containing the positive and negative regulators listed by their Element Names and IDs.

Return type

pd.DataFrame

violin.formatting.get_listname(idx: int, model_df: pandas.core.frame.DataFrame) str[source]

Create the listnames by element attributes. This function generates unique identifiers for elements in the model network using the rules:

  • listname: {element_name}_{element_type}_{element_subtype}_{compartment_ID}

  • For the elements have multiple types and subtypes, the identifier only include the first entry.

  • If any attribute is empty, it is replaced with ‘nan’ in the list name.

These unique identifiers are then used by VIOLIN for further manipulation of the network information.

Parameters
  • idx (int) – the row index of element in the model file.

  • model_df (pd.DataFrame) – A dataframe of a model.

Returns

listname – A formatted name for regulator list column.

Return type

str

Dependencies

Python: pandas and NumPy libraries, as well as the os.path module

VIOLIN: none

Usage

This module is used in during file input in the input/output module. For an example of using the convert functions, see Tutorial 4: Alternative Input.