Programs

1. GO term Summarization: Given a list of genes (downstream), use a number of GO terms (biological process domain) to summarize them. (Put all four files into the same fold and run userCode_1.py)

a. userCode_1.py --- The sample codes of how to use the class GoGraph.

b. GotermSummarization_PV_AllGenesInAssociationFile_quick.py ---The class that uses GO term to summarize a list of genes.

c. gene_association.goa_human_2012 --- The gene to GO term association file from http://www.geneontology.org.

d. newWeightedPubMedGO.xml --- Weighted GO structure file. The file is obtained by adding semantic distance^*1 to all edges in the Gene Ontology structure from http://www.geneontology.org.

2. Find highly dense sub-graph: Given a bipartite graph, find a high density sub-graph with maximum score. (Put two files into the same fold and run userCode_2.py)

a. userCode_2.py --- The sample codes of how to use the class DenseSubGraph.

b. DenseSubGraph_Sorted.py --- The class that finds a highly dense sub-graph.

3. ME algorithm: Given a gene-tumor relation graph, where each gene has a real weight, find a set of genes with minimum weight sum that covers maximum number of tumors. (Put all four files into the same fold and compile the userCode_3.cpp in Linux or Unix system)

a. userCode_3.cpp --- The sample codes of how to use the ME algorithm.

b. BiGraph_ME.h, MutuallyExclusive.h --- main ME program.

c. sampleData_4_ME_algorithm.txt --- sample gene-tumor relation graphs.

Note: The codes need packages networkX v1.3 or a later version.

Footnote:

*1--Bo Jin, Xinghua Lu: Identifying informative subsets of the Gene Ontology with

information bottleneck methods. Bioinformatics 26(19): 2445-2451 (2010).