Python Library API
Causal Relationship Discovery
ensemble
inference by heterogenous parallel base-estimators
def ensemble(data:pd.DataFrame)->List[np.ndarray]:
Inputs
-
data: pd.DataFrame, i.e. the Time-independent steady-state system or experimental cross section data obtained by randomized statistics.
A B C 1 18.5 129.5 243 2 56 492.5 1426 3 250.5 2769 14043.5
Outputs
- candidates: List[np.ndarray], i.e. the outputs of multiple base-e stimators(AKA constraint-based, score-based, permutation-based, linear Gaussian acyclic, continuous optimization)
voting
Soft voting machines that integrate basic estimates
def voting(candidates:List[np.ndarray], priori:pd.DataFrame)->np.ndarray:
Inputs
-
candidates: List[np.ndarray], i.e. the return value of ensemble
-
priori: np.ndarray, i.e.adjcency_matrix of partial graph containing priori knowledge
Outputs
- fine_graph: the predicted causal directed graph
Given Graph Falsification
falsifypack
Inputs
-
data_path: str. The file format should be csv. The file content format should be similar to Table 1
-
graph_path: str. The file format should be gml. GML, the Graph Modelling Language. A GML file consists of a hierarchical key-value lists. Graphs can be annotated with arbitrary data structures. GML is the standard file format in the Graphlet graph editor system.
-
to_plot: boolean. True for plotting the permutation-fraction of violations figure
-
significance_level = 0.05: float. Significance level for the permutation test.
-
significance_ci = 0.05: float. Significance level for (conditional) independence tests.
-
n_perm = 100: int. Number of permutations to perform. If -1 use all n_nodes! permutations.
Outputs
- results: str, i.e. text blob of Falsificaton Summary which contains a set of metrics.
Kink & Discontinuity Regression
kink_regression_2d
use local piecewise linear regression to realize kink regression and discontinuous regression, which applys to timeseries data and common x-y data
Inputs
-
file_path:str, i.e. path to timeseries data and common x-y data. file format: .csv
-
xl: float. local regression lower x-bound
-
xh: float. local regression upper x-bound
-
iplot: boolean. True for plotting the x-y regression figure
Outputs
-
best_knot_x: the x-coordinate of kink with lowest residual error
-
best_model: an object of piecewise regression model class
kink_regression_3d
use local piecewise linear regression of higher dimension to realize kink regression and discontinuous regression, which applys to x-y-target data
Inputs
-
file_path: str, i.e. path to common x-y-z data. file format: .csv
-
xl, xh, yl, yh: float, i.e. local regression lower and upper xy-bound
-
iplot: boolean. True for plotting the regression figure
-
response_surface_design: boolean. True for build response surface
Outputs
-
best_knot_x,best_knot_y: the xy-coordinate of kink with lowest residual error
-
best_model: an object of piecewise regression model class
-
filtered_data: points of O(mintarget) in the response surface, if response_surface_design == True
Response Surface Methodology
bayes_opt
def bayes_opt(df:pd.DataFrame, ploy_dim=2, init_p=5, iters=25, verbose_=2, mode='maximize'):
Inputs
-
df: pd.DataFrame with columns "X", "Y", "Z".
-
ploy_dim: Degree of polynomial surrogate model
-
init_p: Number of random points to probe before starting the optimization.
-
iters: Number of iterations where the method attempts to find the maximum value.
-
verbose_: The level of verbosity. verbose = 2 prints all probed points, verbose = 1 prints only when a maximum is observed, verbose = 0 is silent.
-
mode: str. 'maximize' or 'minimize'
Outputs
- results: json. {'target':, 'params': {'x':, 'y': _}}
Input-Output Module
load_data
Inputs
-
path: str, i.e. the relative or absolute path of data file to the current directory
-
type: str, i.e. the type of the file. Supported types include 'csv', 'gse', 'e-mtab' , which refer to common .csv-like file, GEO "Series Matrix File(s)" and ArrayExress "E-MTAB" respectively.
Return
- processed_data: DataFrame, i.e. legal data to calculate by the above functions and to be accepted by the API of the python library
array_to_gml
Inputs
-
arr: np.ndarray, i.e. the adjacency matrix of causal graph
-
file_name: str, i.e. the name of the file saved in the current dir.
-
labels: list[str], i.e. a 3x3 array represents 3 nodes ['A','B','C']
Outputs
- file: .gml. GML, the Graph Modelling Language. GML's key features are portability, simple syntax, extensibility and flexibility. A GML file consists of a hierarchical key-value lists. Graphs can be annotated with arbitrary data structures. The idea for a common file format was born at the GD'95; this proposal is the outcome of many discussions. GML is the standard file format in the Graphlet graph editor system. It has been overtaken and adapted by several other systems for drawing graphs.
gml_to_array
Inputs
- file_path: .gml. GML, the Graph Modelling Language. A GML file consists of a hierarchical key-value lists. Graphs can be annotated with arbitrary data structures. GML is the standard file format in the Graphlet graph editor system.
Return
- array: np.ndarray, i.e. the adjacency matrix of causal graph