Formatting (violin.formatting)
This page details the formatting functions of VIOLIN, used during model and reading input.
The formatting step is important, as it:
identifies duplicate interactions in the reading output,
counts the number of times an interaction was found in the reading (Evidence Score),
converts the variable representation of the model regulators into the common names
The formatting functions are also responsible for inputting models and machine reading output which are not in the BioRECIPES or REACH format (respectively).
Functions
- violin.formatting.evidence_score(reading_df: pandas.core.frame.DataFrame, col_names: list) pandas.core.frame.DataFrame[source]
This function merges duplicate interactions and calculates evidence score of each interaction.
- Parameters
reading_df (pd.DataFrame) – A dataframe of the interaction list with BioRECIPE format.
col_names (list) – A list of column headings used to determine if interactions are identical.
- Returns
counted_reading – A new dataframe with the evidence count and PMCID list for each interaction.
- Return type
pd.DataFrame
- violin.formatting.add_regulator_names_id(model_df: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame[source]
This function converts the model regulator lists from ‘variables’ to the common element names and database identifiers.
- Parameters
model_df (pd.DataFrame) – The model dataframe (in BioRECIPE format).
- Returns
model_df – A new dataframe with added columns containing the positive and negative regulators listed by their Element Names and IDs.
- Return type
pd.DataFrame
- violin.formatting.get_listname(idx: int, model_df: pandas.core.frame.DataFrame) str[source]
Create the listnames by element attributes. This function generates unique identifiers for elements in the model network using the rules:
listname: {element_name}_{element_type}_{element_subtype}_{compartment_ID}
For the elements have multiple types and subtypes, the identifier only include the first entry.
If any attribute is empty, it is replaced with ‘nan’ in the list name.
These unique identifiers are then used by VIOLIN for further manipulation of the network information.
- Parameters
idx (int) – the row index of element in the model file.
model_df (pd.DataFrame) – A dataframe of a model.
- Returns
listname – A formatted name for regulator list column.
- Return type
str
Dependencies
Python: pandas and NumPy libraries, as well as the os.path module
VIOLIN: none
Usage
This module is used in during file input in the input/output module. For an example of using the convert functions, see Tutorial 4: Alternative Input.