Formatting (violin.formatting
)
This page details the formatting functions of VIOLIN, used during model and reading input.
The formatting step is important, as it:
identifies duplicate interactions in the reading output,
counts the number of times an interaction was found in the reading (Evidence Score),
converts the variable representation of the model regulators into the common names
The formatting functions are also responsible for inputting models and machine reading output which are not in the BioRECIPES or REACH format (respectively).
Functions
- formatting.evidence_score(reading_df, col_names)[source]
This function merges duplicate interactions and calculates evidence score of each LEE
- Parameters
reading_df (pd.DataFrame) – The dataframe of the machine reading output
col_names (list) – Specifically the column headings used to determine if interactions are identical
- Returns
counted_reading – A new dataframe with the evidence count and PMCID list for each interaction
- Return type
pd.DataFrame
- formatting.add_regulator_names_id(model_df)[source]
This function converts the model regulator lists from BioRECIPE variables to the common element names and database identifiers
- Parameters
model_df (pd.DataFrame) – The model dataframe (in BioRECIPE format)
- Returns
model_df – A new dataframe with added columns containing the positive and negative regulators listed by their Element Names and IDs
- Return type
pd.DataFrame
- formatting.convert_to_biorecipes(model, att_list=[], separate=True)[source]
This function imports a model which is NOT in the BioRECIPES format, such as models formatted as node-edge lists. Regulators may be represented in the REACH formatt, separated by regulator sign, or unseparated, with a speicifed column for regulator sign
- Parameters
model (str) – Directory and filename of the file containing the model BioRECIPES spreadsheet Accepted files: .txt, .csv, .tsv, .xlsx
model_cols (list) – Column names of the model file. Default names are found in required_model
att_list (list) – List of Element attributes (in addition to Name, ID, and Type) Default is no additional attributes
separate (Boolean) – Whether or not the model presents regulator in separate Positive/Negative columns (True) or in a single column with Regulator Sign attribute (False) Default is True
- Returns
new_model – Formatted model dataframe
- Return type
pd.DataFrame
- formatting.convert_reading(reading, action, atts=[])[source]
This function formats the machine reading output, either separating regulator names and attributes into ‘positive’ and ‘negative’ columns to match REACH formatting, or combining regulator names and attributes without regulator sign distinction, and adding a ‘regulator sign’ column. This function can take the machine reading as either a filename or as an already uploaded dataframe.
- Parameters
reading (str or pd.DataFrame) – Machine reading output, either as file location string or dataframe
action (str) – Action to be performed by function Accepts only ‘combine’ or ‘separate’ as input
atts (list) – List of attributes associated with each regualtor Default list is [‘Type’,’ID’] List should not include regulator signs (where applicable)
- Returns
reading_df – A dataframe with the specified formatting completed
- Return type
pd.DataFrame
Dependencies
Python: pandas and NumPy libraries, as well as the os.path module
VIOLIN: none
Usage
This module is used in during file input in the input/output module. For an example of using the convert functions, see Tutorial 4: Alternative Input.