Formatting (`violin.formatting`)

This page details the formatting functions of VIOLIN, used during model and reading input.

The formatting step is important, as it:

identifies duplicate interactions in the reading output,
counts the number of times an interaction was found in the reading (Evidence Score),
converts the variable representation of the model regulators into the common names

The formatting functions are also responsible for inputting models and machine reading output which are not in the BioRECIPES or REACH format (respectively).

Functions

formatting.evidence_score(reading_df, col_names)[source]

This function merges duplicate interactions and calculates evidence score of each LEE

Parameters

reading_df (pd.DataFrame) – The dataframe of the machine reading output
col_names (list) – Specifically the column headings used to determine if interactions are identical

Returns

counted_reading – A new dataframe with the evidence count and PMCID list for each interaction

Return type

pd.DataFrame

formatting.add_regulator_names_id(model_df)[source]

This function converts the model regulator lists from BioRECIPE variables to the common element names and database identifiers

Parameters: model_df (pd.DataFrame) – The model dataframe (in BioRECIPE format)
Returns: model_df – A new dataframe with added columns containing the positive and negative regulators listed by their Element Names and IDs
Return type: pd.DataFrame

formatting.convert_to_biorecipes(model, att_list=[], separate=True)[source]

This function imports a model which is NOT in the BioRECIPES format, such as models formatted as node-edge lists. Regulators may be represented in the REACH formatt, separated by regulator sign, or unseparated, with a speicifed column for regulator sign

Parameters

model (str) – Directory and filename of the file containing the model BioRECIPES spreadsheet Accepted files: .txt, .csv, .tsv, .xlsx
model_cols (list) – Column names of the model file. Default names are found in required_model
att_list (list) – List of Element attributes (in addition to Name, ID, and Type) Default is no additional attributes
separate (Boolean) – Whether or not the model presents regulator in separate Positive/Negative columns (True) or in a single column with Regulator Sign attribute (False) Default is True

Returns

new_model – Formatted model dataframe

Return type

pd.DataFrame

formatting.convert_reading(reading, action, atts=[])[source]

This function formats the machine reading output, either separating regulator names and attributes into ‘positive’ and ‘negative’ columns to match REACH formatting, or combining regulator names and attributes without regulator sign distinction, and adding a ‘regulator sign’ column. This function can take the machine reading as either a filename or as an already uploaded dataframe.

Parameters

reading (str or pd.DataFrame) – Machine reading output, either as file location string or dataframe
action (str) – Action to be performed by function Accepts only ‘combine’ or ‘separate’ as input
atts (list) – List of attributes associated with each regualtor Default list is [‘Type’,’ID’] List should not include regulator signs (where applicable)

Returns

reading_df – A dataframe with the specified formatting completed

Return type

pd.DataFrame

Dependencies

Python: pandas and NumPy libraries, as well as the os.path module

VIOLIN: none

Usage

This module is used in during file input in the input/output module. For an example of using the convert functions, see Tutorial 4: Alternative Input.

Formatting (violin.formatting)

Functions

Dependencies

Usage

Formatting (`violin.formatting`)