General Tutorial¶
Here is a basic but overall tutorial for using the library:
Benchmarking using the sachs dataset¶
Karen Sachs et al.Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data. Science308,523-529(2005).DOI:10.1126/science.1105809
In [2]:
Copied!
import expAscribe as ea
data = ea.load_data('./data/cyto_full_data.csv',type='csv')
import expAscribe as ea
data = ea.load_data('./data/cyto_full_data.csv',type='csv')
In [9]:
Copied!
data
data
Out[9]:
praf | pmek | plcg | PIP2 | PIP3 | p44/42 | pakts473 | PKA | PKC | P38 | pjnk | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 26.4 | 13.20 | 8.82 | 18.30 | 58.80 | 6.61 | 17.0 | 414.0 | 17.00 | 44.90 | 40.00 |
1 | 35.9 | 16.50 | 12.30 | 16.80 | 8.13 | 18.60 | 32.5 | 352.0 | 3.37 | 16.50 | 61.50 |
2 | 59.4 | 44.10 | 14.60 | 10.20 | 13.00 | 14.90 | 32.5 | 403.0 | 11.40 | 31.90 | 19.50 |
3 | 73.0 | 82.80 | 23.10 | 13.50 | 1.29 | 5.83 | 11.8 | 528.0 | 13.70 | 28.60 | 23.10 |
4 | 33.7 | 19.80 | 5.19 | 9.73 | 24.80 | 21.10 | 46.1 | 305.0 | 4.66 | 25.70 | 81.30 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
7461 | 49.1 | 12.40 | 32.80 | 27.90 | 22.70 | 11.70 | 38.2 | 1144.0 | 1.00 | 2.55 | 1.00 |
7462 | 23.3 | 4.61 | 17.80 | 22.10 | 14.90 | 48.70 | 67.3 | 922.0 | 1.00 | 9.82 | 1.00 |
7463 | 28.1 | 4.49 | 18.80 | 20.20 | 10.20 | 3.08 | 21.9 | 730.0 | 1.00 | 1.75 | 2.00 |
7464 | 34.6 | 7.10 | 5.73 | 20.70 | 15.10 | 32.20 | 41.4 | 813.0 | 44.50 | 1382.00 | 2.44 |
7465 | 30.5 | 1.01 | 7.30 | 173.00 | 22.90 | 6.61 | 13.7 | 890.0 | 1.00 | 1.00 | 1.65 |
7466 rows × 11 columns
In addition, the package supports the gse standard for geo databases and the e-mtab standard for arrayexpress databases
In [5]:
Copied!
data1 = ea.load_data('./data/GSE150859_series_matrix.txt',type='gse')
data1
data1 = ea.load_data('./data/GSE150859_series_matrix.txt',type='gse')
data1
Out[5]:
V5-GFP | V5-DDX39A | V5-DDX56 | |
---|---|---|---|
0 | 18.5 | 129.5 | 243 |
1 | 56 | 492.5 | 1426 |
2 | 250.5 | 2769 | 14043.5 |
3 | 49 | 188 | 368.5 |
4 | 33 | 301 | 357 |
... | ... | ... | ... |
9370 | 13.5 | 288.5 | 414 |
9371 | 12 | 78.5 | 53 |
9372 | 14.5 | 72 | 13.5 |
9373 | -2 | 1305 | 32.5 |
9374 | 3.5 | 51 | 36.5 |
9375 rows × 3 columns
In [10]:
Copied!
data2 = ea.load_data('./data/E-MTAB/E-MTAB-1881/nordata_housekeeping9.txt',type='e-mtab')
data2
data2 = ea.load_data('./data/E-MTAB/E-MTAB-1881/nordata_housekeeping9.txt',type='e-mtab')
data2
Out[10]:
ZJXT | YYFT | CXCT | GJFT | HSPT | CGFT | SHST | LYXT | CFDT | QLTT | XSHT | XYT | CXMT | LCMT | LHST | MGFT | JCJT | QALT | WJLT | WDZT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 9.111150 | 9.232378 | 8.857294 | 9.271755 | 9.927486 | 10.240846 | 8.952556 | 10.385333 | 9.796349 | 8.087126 | 8.973760 | 8.406488 | 10.813435 | 11.556793 | 10.120671 | 8.212669 | 9.388566 | 7.873331 | 9.095918 | 9.755586 |
1 | 3.227157 | 3.063091 | 5.119717 | 4.483274 | 2.041509 | 1.187559 | 1.348744 | 3.291911 | 2.070610 | 4.394380 | 3.241642 | 3.790899 | 5.617061 | 13.646993 | 5.779304 | 5.645933 | 5.308709 | 4.572374 | 5.999214 | 4.951806 |
2 | 7.713586 | 10.059047 | 7.725828 | 6.104085 | 8.439928 | 9.473998 | 8.744809 | 6.049309 | 8.741083 | 6.640744 | 7.155171 | 8.397895 | 7.312339 | 11.171599 | 6.714994 | 6.397142 | 10.430565 | 8.199285 | 8.007745 | 8.355071 |
3 | 2.388695 | 1.829009 | 11.942279 | 8.122652 | 4.259124 | 1.111801 | 1.348744 | 12.259561 | 4.707813 | 2.407039 | 14.652141 | 2.865968 | 5.643495 | 10.663609 | 5.126418 | 5.454684 | 6.370328 | 6.030684 | 2.580007 | 7.273686 |
4 | 14.724577 | 14.428101 | 14.085655 | 14.216032 | 14.109368 | 14.725469 | 14.510718 | 14.613923 | 14.255838 | 13.952195 | 14.175162 | 14.002354 | 14.670586 | 13.177121 | 15.266736 | 15.083636 | 15.987569 | 14.552032 | 14.059699 | 14.826053 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1951 | 17.964130 | 18.214308 | 17.903526 | 17.607185 | 17.725956 | 17.459050 | 17.319977 | 17.240541 | 18.047880 | 18.387932 | 17.922150 | 17.881990 | 16.723476 | 18.962051 | 16.701130 | 17.202204 | 18.310410 | 17.266232 | 18.330746 | 18.065432 |
1952 | 7.034416 | 7.354255 | 9.233629 | 7.532152 | 7.505271 | 8.259603 | 8.070838 | 10.593454 | 7.032317 | 6.852897 | 8.575504 | 7.075691 | 4.027886 | 11.826261 | 11.776199 | 8.252106 | 8.263076 | 10.856247 | 10.922974 | 9.101564 |
1953 | 14.073858 | 14.743323 | 12.860473 | 13.280211 | 14.194317 | 14.040255 | 14.073015 | 13.100442 | 13.341913 | 12.861149 | 13.096608 | 13.127536 | 14.674112 | 13.699254 | 13.821383 | 13.816957 | 13.254840 | 14.017818 | 13.426982 | 12.988242 |
1954 | 10.559853 | 9.545756 | 10.690983 | 10.768274 | 8.824124 | 10.002195 | 8.906384 | 9.748693 | 9.722105 | 10.143320 | 10.510729 | 9.854776 | 9.447465 | 11.942350 | 9.384220 | 9.848043 | 10.878305 | 10.394729 | 11.319130 | 10.136945 |
1955 | 12.441194 | 12.217199 | 11.506890 | 11.731279 | 11.679990 | 12.470256 | 11.854928 | 11.824907 | 11.815837 | 11.402479 | 11.214536 | 11.180319 | 12.729002 | 12.115368 | 12.129157 | 12.023438 | 11.373691 | 10.350244 | 10.754209 | 10.672611 |
1956 rows × 20 columns
Here, we take Sachs dataset into calculating
Inference by heterogenous parallel base-estimators¶
Base-estimators include ICALiNGAM, FCI, GES, GRaSP, DagmaLinear
In [10]:
Copied!
candidates = ea.ensemble(data)
candidates = ea.ensemble(data)
0%| | 0/11 [00:00<?, ?it/s]
X2 --> X1 X3 --> X4 X7 --> X3 X3 --> X8 X3 --> X10 X3 --> X11 X7 --> X11 X10 --> X11 GRaSP edge count: 29 GRaSP completed in: 0.61s
0%| | 0/180000.0 [00:00<?, ?it/s]
Soft Voting Machine¶
Results of each of the above models weighted by Logistic regression
In [11]:
Copied!
import pandas as pd
y_meta = pd.read_csv("./data/G_partial.csv",header=None)
y_true = pd.read_csv("./data/G_true.csv",header=None)
'''
y_meta: priori knowleage
y_true: truth in consensus
'''
fine_graph = ea.voting(candidates,y_meta)
import pandas as pd
y_meta = pd.read_csv("./data/G_partial.csv",header=None)
y_true = pd.read_csv("./data/G_true.csv",header=None)
'''
y_meta: priori knowleage
y_true: truth in consensus
'''
fine_graph = ea.voting(candidates,y_meta)
In [12]:
Copied!
fine_graph
fine_graph
Out[12]:
array([[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0], [1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0], [0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1], [0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0], [1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 0], [1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0]])
In [17]:
Copied!
import numpy as np
from sklearn.metrics import accuracy_score, recall_score, f1_score
def score_final_adj_matrix(final_adj_matrix:np.ndarray, y_meta:np.ndarray):
y_true_flat = y_meta.flatten()
y_pred_flat = final_adj_matrix.flatten()
y_true_binary = (y_true_flat > 0).astype(int)
y_pred_binary = (y_pred_flat > 0).astype(int)
auc = accuracy_score(y_true_binary, y_pred_binary)
recall = recall_score(y_true_binary, y_pred_binary)
f1 = f1_score(y_true_binary, y_pred_binary)
return auc, recall, f1
accuracy, recall, f1 = score_final_adj_matrix(fine_graph, y_true.to_numpy())
print(f"accuracy: {accuracy:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
import numpy as np
from sklearn.metrics import accuracy_score, recall_score, f1_score
def score_final_adj_matrix(final_adj_matrix:np.ndarray, y_meta:np.ndarray):
y_true_flat = y_meta.flatten()
y_pred_flat = final_adj_matrix.flatten()
y_true_binary = (y_true_flat > 0).astype(int)
y_pred_binary = (y_pred_flat > 0).astype(int)
auc = accuracy_score(y_true_binary, y_pred_binary)
recall = recall_score(y_true_binary, y_pred_binary)
f1 = f1_score(y_true_binary, y_pred_binary)
return auc, recall, f1
accuracy, recall, f1 = score_final_adj_matrix(fine_graph, y_true.to_numpy())
print(f"accuracy: {accuracy:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
accuracy: 0.7107 Recall: 0.4211 F1 Score: 0.3137
In [15]:
Copied!
from expAscribe.score import ratio_of_paths
'''
ratio_of_paths:
return the ratio of path in true graph predicted by the results
'''
print(ratio_of_paths(fine_graph, y_true.to_numpy()))
from expAscribe.score import ratio_of_paths
'''
ratio_of_paths:
return the ratio of path in true graph predicted by the results
'''
print(ratio_of_paths(fine_graph, y_true.to_numpy()))
1.0
Falsification of User-Given Directed Acyclic Graphs¶
In [12]:
Copied!
ana = ea.falsifypack("./data/cyto_full_data.csv","./data/falsify_sachs.gml")
ana = ea.falsifypack("./data/cyto_full_data.csv","./data/falsify_sachs.gml")
Test permutations of given graph: 100%|██████████| 100/100 [09:10<00:00, 5.51s/it]
In [13]:
Copied!
print(ana)
print(ana)
+-------------------------------------------------------------------------------------------------------+ | Falsificaton Summary | +-------------------------------------------------------------------------------------------------------+ | The given DAG is informative because 0 / 100 of the permutations lie in the Markov | | equivalence class of the given DAG (p-value: 0.00). | | The given DAG violates 21/49 LMCs and is better than 98.0% of the permuted DAGs (p-value: 0.02). | | Based on the provided significance level (0.05) and because the DAG is informative, | | we do not reject the DAG. | +-------------------------------------------------------------------------------------------------------+
In [64]:
Copied!
ea.drawgraph(fine_graph,['praf','pmek','plcg','PIP2','PIP3','p44/42','pakts473','PKA','PKC','P38','pjnk'])
ea.drawgraph(fine_graph,['praf','pmek','plcg','PIP2','PIP3','p44/42','pakts473','PKA','PKC','P38','pjnk'])
Regression discontinuity & Regression kink design¶
implemented by piecewise linear fit
In [27]:
Copied!
kink_knot, kink_model = ea.kink_regression_2d('./data/manually_labeled/kink_test_2d.csv', xl = -1.5, xh = 0.8, iplot = True)
kink_knot, kink_model = ea.kink_regression_2d('./data/manually_labeled/kink_test_2d.csv', xl = -1.5, xh = 0.8, iplot = True)
In [3]:
Copied!
print(kink_knot)
print(kink_knot)
-0.3157894736842106
In [4]:
Copied!
print(kink_model.summary())
print(kink_model.summary())
OLS Regression Results ============================================================================== Dep. Variable: Y R-squared: 0.893 Model: OLS Adj. R-squared: 0.886 Method: Least Squares F-statistic: 128.2 Date: Thu, 29 Aug 2024 Prob (F-statistic): 2.40e-22 Time: 23:53:33 Log-Likelihood: -73.098 No. Observations: 50 AIC: 154.2 Df Residuals: 46 BIC: 161.8 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 2.5944 0.542 4.784 0.000 1.503 3.686 X -1.6989 0.464 -3.662 0.001 -2.633 -0.765 D 0.2334 0.528 0.442 0.661 -0.830 1.297 kink 7.3120 0.577 12.676 0.000 6.151 8.473 ============================================================================== Omnibus: 0.890 Durbin-Watson: 2.013 Prob(Omnibus): 0.641 Jarque-Bera (JB): 0.878 Skew: 0.130 Prob(JB): 0.645 Kurtosis: 2.405 Cond. No. 7.82 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [2]:
Copied!
kink_knot_x, kink_knot_y, kink_model_3d, filtered_samples = ea.kink_regression_3d('./data/manually_labeled/kink_test_3d.csv', xl = -2.0, xh = 2.0, yl = -2.0, yh = 2.0, iplot = True, responsive_surface_design=True)
kink_knot_x, kink_knot_y, kink_model_3d, filtered_samples = ea.kink_regression_3d('./data/manually_labeled/kink_test_3d.csv', xl = -2.0, xh = 2.0, yl = -2.0, yh = 2.0, iplot = True, responsive_surface_design=True)
In [4]:
Copied!
print(kink_knot_x)
print(kink_knot_y)
print(kink_knot_x)
print(kink_knot_y)
0.10526315789473673 0.10526315789473673
In [5]:
Copied!
print(kink_model_3d.summary())
print(kink_model_3d.summary())
OLS Regression Results ============================================================================== Dep. Variable: Z R-squared: 0.852 Model: OLS Adj. R-squared: 0.850 Method: Least Squares F-statistic: 473.5 Date: Fri, 30 Aug 2024 Prob (F-statistic): 5.33e-201 Time: 00:27:13 Log-Likelihood: -1057.0 No. Observations: 500 AIC: 2128. Df Residuals: 493 BIC: 2158. Df Model: 6 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const 2.5839 0.253 10.229 0.000 2.088 3.080 X -3.7916 0.210 -18.094 0.000 -4.203 -3.380 Y -5.8180 0.212 -27.474 0.000 -6.234 -5.402 D_x -0.3700 0.299 -1.236 0.217 -0.958 0.218 D_y 0.3550 0.301 1.181 0.238 -0.236 0.946 kink_x 8.6864 0.304 28.597 0.000 8.090 9.283 kink_y 12.8576 0.308 41.688 0.000 12.252 13.464 ============================================================================== Omnibus: 234.186 Durbin-Watson: 1.924 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2187.502 Skew: 1.805 Prob(JB): 0.00 Kurtosis: 12.590 Cond. No. 6.63 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Responsive Surface Design¶
Bayesian optimization of well characterized sample points
In [9]:
Copied!
print(ea.bayes_opt(filtered_samples, init_p= 2, iters=5, mode='minimize'))
print(ea.bayes_opt(filtered_samples, init_p= 2, iters=5, mode='minimize'))
| iter | target | x | y | ------------------------------------------------- | 1 | -5.106 | -0.04683 | 0.2631 | | 2 | -5.597 | 0.2158 | 0.1001 | | 3 | -5.529 | 0.1725 | 0.06872 | | 4 | -4.724 | -0.2136 | 0.2745 | | 5 | -5.147 | -0.3221 | 0.01879 | | 6 | -4.407 | -0.3221 | 0.2859 | | 7 | -3.928 | 0.4128 | -0.1769 | ================================================= {'target': -3.928409120419806, 'params': {'x': 0.4127809269364983, 'y': -0.1769472274940494}}