Next: Table of Contents, Previous: (Dir), Up: (Dir) [Contents][Index]
• Table of Contents: | ||
Reference Manual | ||
• Licenses: | Graph_Sampler is under GNU General Public License | |
• Overview: | ||
• Installation: | ||
• Running Graph_Sampler: | ||
• Bibliographic References: | ||
• Index: |
bBN
, bayesian_network
dynamic_bayesian_network
nNodes
, n_nodes
autocycle
initial_adjacency
hyper_pB
bPriorConcordance
, concordance_prior
edge_requirements
lambda_concord
, lambda_concordance
bPriorDegreeNode
, degree_prior
gamma_degree
bPriorMotif
, motif_prior
alpha_motif
beta_motif
nData
, n_data
data
bDirichlet
, dirichlet_score
n_data_levels
, nData_levels
bZellner
, zellner_score
nRuns
, n_runs
nBurnin
, n_burnin
seed
, random_seed
bsave_the_chain
, save_chain
nSaved_adjacency
, n_saved_adjacency
bsave_best_graph
, save_best_graph
bsave_the_edge_probabilies
, save_the_edge_probabilies
bsave_the_degree_counts
, save_the_degree_counts
bsave_the_motifs_probabilies
, save_the_motifs_probabilies
Next: Overview, Previous: Table of Contents, Up: Top [Contents][Index]
Graph_Sampler is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
Version 1.2, November 2002
Copyright © 2000,2001,2002 Free Software Foundation, Inc. 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
The purpose of this License is to make a manual, textbook, or other functional and useful document free in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while not being considered responsible for modifications made by others.
This License is a kind of “copyleft”, which means that derivative works of the document must themselves be free in the same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. But this License is not limited to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed book. We recommend this License principally for works whose purpose is instruction or reference.
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited in duration, to use that work under the conditions stated herein. The “Document”, below, refers to any such manual or work. Any member of the public is a licensee, and is addressed as “you”. You accept the license if you copy, modify or distribute the work in a way requiring permission under copyright law.
A “Modified Version” of the Document means any work containing the Document or a portion of it, either copied verbatim, or with modifications and/or translated into another language.
A “Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The “Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections, in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does not identify any Invariant Sections then there are none.
The “Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a Back-Cover Text may be at most 25 words.
A “Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification is available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount of text. A copy that is not “Transparent” is called “Opaque”.
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by some word processors for output purposes only.
The “Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly, the material this License requires to appear in the title page. For works in formats which do not have any title page as such, “Title Page” means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of the text.
A section “Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned below, such as “Acknowledgements”, “Dedications”, “Endorsements”, or “History”.) To “Preserve the Title” of such a section when you modify the Document means that it remains a section “Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies, and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number of copies, to give them a chance to provide you with an updated version of the Document.
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document, thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must do these things in the Modified Version:
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this, add their titles to the list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any other section titles.
You may add a section Entitled “Endorsements”, provided it contains nothing but endorsements of your Modified Version by various parties—for example, statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text, to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version.
You may combine the Document with other documents released under this License, under the terms defined in section 4 above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work.
In the combination, you must combine any sections Entitled “History” in the various original documents, forming one section Entitled “History”; likewise combine any sections Entitled “Acknowledgements”, and any sections Entitled “Dedications”. You must delete all sections Entitled “Endorsements.”
You may make a collection consisting of the Document and other documents released under this License, and replace the individual copies of this License in the various documents with a single copy that is included in the collection, provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim copying of that document.
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a volume of a storage or distribution medium, is called an “aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works permit. When the Document is included in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than one half of the entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed covers that bracket the whole aggregate.
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that you also include the original English version of this License and the original versions of those notices and disclaimers. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version will prevail.
If a section in the Document is Entitled “Acknowledgements”, “Dedications”, or “History”, the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title.
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License. Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered version of this License “or any later version” applies to it, you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the Free Software Foundation.
Next: Installation, Previous: Licenses, Up: Top [Contents][Index]
Graph_Sampler is an inference and simulation tool for networks (understood as graphs). It can simulate random graphs for general directed graphs (eventually cyclic) (see Bois & Gayraud 2013) or for directed acyclic graphs (Bayesian networks). The graphs are generated by Markov chain Monte Carlo simulations and their structure can be specified to follow probabilistic properties through the use of prior distributions. In the case of Bayesian networks, you can also infer about their probable structure through the joint use of priors and data about node values (via a likelihood function).
You write an input files and run compiled graph_sampler
program. The input files specifies the kind of graph to simulate, some
simulation parameters and output options, the priors you want, the
eventual data and their likelihood (see Running Graph_Sampler). The simulation output is written to standard ASCII
files.
No knowledge of computer programming is required, unless you want to tailor the program to special needs (in which case you may want to contact us).
Next: Running Graph_Sampler, Previous: Overview, Up: Top [Contents][Index]
Graph_Sampler is written in ANSI-standard C language. We are distributing the source code and you should be able to compile it for any system, provided you have an ANSI C compliant compiler.
On any system we recommend the GNU gcc
compiler (freeware). An
automated compilation script (called Makefile) is provided and
can be used if the standard command make
is available to
you.
If you want to modify the input file parser you will need lex
and yacc
(that is for experienced C programmers).
Graph_Sampler source code is available on Internet through:
https://sites.google.com/site/utcchairmmbsptp/graph_sampler.
To install on a Unix or GNU/Linux machine, download (in binary mode)
the distributed archive file to your machine. Place it in a directory
where there is no existing graph_sampler subdirectory that
could be erased (make sure you check that). Decompress the archive
with GNU gunzip (gunzip <archive-name>.tar.gz
). Untar the
decompressed archive with tar (tar xf <archive-name>.tar
) (do
man tar
for further help). Many other archiving tools can be
used in place of gunzip and tar. Move to the graph_sampler
directory just created and issue the following commands:
make
This command compiles the graph_sampler
program.
You can also compiles this manual as an info file with the command
make info
or as an html file with make
html
.
Under other operating systems (Windows, etc.) or if everything else fails you should be able to both uncompress and untar the archive with widely distributed archiving tools. Refer to the documentation of your C compiler to create an executable file from the source code files provided.
You are now ready to use Graph_Sampler.
Next: Bibliographic References, Previous: Installation, Up: Top [Contents][Index]
After having compiled graph_sampler, you are ready to run it. For this you need to write an input file. This chapter explains how to write such files with the proper syntax.
In Unix the command-line syntax to run that executable is simply:
graph_sampler [input-file [output-prefix]]
where the brackets indicate optional arguments. If no input file or/and output prefix are specified, the program will use defaults. The default input file name is script.txt, the output files created depend on your selection in the input file (see below) and their name is printed on exit. Default output file names are best_graph.out, graph_samples.out, degree_count.out, motifs_count.out, edge_p.out, results_mcmc.bin. If you just specify an input file name, output file names will still be the default ones. If you specify an input file name and an output prefix the standard output file names will be prefixed by it (i.e., with the prefix my the edge probabilities output file will be named my_edge_p.out).
When the program starts, it announces which model description file was used to create it. While the input file is read or while simulations are running, some informations will be printed on your computer screen. They can help you check that the input file is correctly interpreted and that the program runs as it should. Graph_Sampler can also post error messages, which should be self-explanatory. Where appropriate, they show the line number in the input file where the error occurred.
The program ends (if everything is fine) by giving you the name of the
output file generated. If you want to run the program in batch mode
(in the background), you may want to redirect the screen output and
error messages; refer for this to the man
pages for your
command shell.
An input files specifies the kind of graph to simulate, some simulation parameters and output options, the priors you want, the eventual data and their likelihood. All that is done through the specification of predefined variables, using some keywords, user defined variables, numbers and operators.
A Graph_Sampler input file is a text (ASCII) file that obeys a relatively simple syntax:
# this is a comment, comments are useful
Xa_2
Note that unassigned variables have a default value of zero.
nNodes
(5 + 6) * (3.4 / 1.1E-8) + Xa_2;
<variable> = <expression>;
.
Example:
X_a2 = 5000; nNodes = 6 * Xa_2;
Xa_2 = (2 + 3) / (25. - 5.76);
1, 2, 2+1, 2*2, 5, Xa_2
<variable> = array{<list of expressions>};
.
The term array
is a reserved keyword (see the list of those
keywords below). Example:
n_data_levels = array{2, 2, 1+1};
<variable> = matrix{<list of expressions>};
.
The term matrix
is a reserved keyword (see the list of those
keywords below). Example:
data = matrix{ 1, 2, 2+1, 2*2, 5, Xa_2};
That is the general form, some matrices can accept keywords such as
empty
, full
, or random
instead of a list of
expression inside the curly braces (see the specification of each
predefined matrix, below).
Here are the predefined variables that Graph_Sampler understands (they may have different synomyms, separed by commas):
bBN
, bayesian_network
The predefined variable bBN
indicate whether the graphs to
sample are Bayesian networks (in that case it should be set to 1 or
true
) or general directed graphs (in which case it should set
to 0 or false
). General directed graphs can only be simulated
on the basis of priors. For Bayesian networks both simulation and
structural inference can be performed. The default value for
bBN
is false
. Example:
bBN = true; # bBN = 1 would also work
dynamic_bayesian_network
If dynamic_bayesian_network
is set to to 1 or true
the
graphs sampled are dynamic Bayesian networks. For such networks both
simulation and structural inference can be performed. The default
value for dynamic_bayesian_network
is
false
.
nNodes
, n_nodes
The number of nodes in the network considered is specified by setting
nNodes
to an integer (not long integer) value. nNodes
must be set before the initial adjacency or prior on edges’
probability matrices are defined. The default value for nNodes
is 0, which raises an error message, because nNodes
should be
set to a meaningful value.
autocycle
The autocycle
variable should be set to 1 (true
) if
edges from a node to itself are allowed, and to 0 (false
)
otherwise. Its default value is false
. Setting it to
true
is incompatible with specifying bBN
to true
(loops are not allowed in Bayesian networks).
initial_adjacency
The starting value of the graph adjacency matrix is defined by setting
initial_adjacency
, a square matrix of dimension
nNodes
. Matrix elements should be either 0 or 1. Element
[i,j] is set to 1 if an edge (link) goes from node i
to node j. Setting it to 0 indicates no edge between the two
nodes. Example:
nNodes = 3; initial_adjacency = matrix {0, 0, 0, 1, 0, 0, 1, 0, 0};
initial_adjacency
definition can also use an extended
syntax:
initial_adjacency = matrix{empty | full | random};
.
were “|” means “or”.
empty
is used all elements will be set to zero.
full
is used all elements will be set to 1 when bBN
is
false
. If bBN
is true
the diagonal elements will
be set to zero and the others to 1. If you want it to work with
Bayesian networks, you should set bBN
before defining
initial_adjacency
, because its default value is
false
.
random
is used all elements will be set randomly to 0 or 1
(with equal probability) when bBN
is false
. If bBN
is
true
the diagonal elements will be set to zero and the others
to 0 or 1. If you want it to work with Bayesian networks, you should set
bBN
before defining initial_adjacency
, because its
default value is false
.
hyper_pB
The matrix hyper_pB
is a square matrix of dimension
nNodes
which specifies a prior distribution on edge
probabilities. Each element [i,j] of hyper_pB
is the
parameter p (a real of double format) of a Bernoulli distribution
for the presence of an edge from node i to node j. In the case
of Bayesian networks, p values should be 0 on the first
diagonal. Example:
bBN = true; hyper_pB = matrix {0, 0.1, 0.1, 0.9, 0, 0.1, 0.9, 0.1, 0 };
Internally, hyper_pB
is always used. If it is not defined by
the user, p values will default to 0.5 (with zeroes on the
diagonal if bBN
is true
), so that the prior is neutral
(equal probability for the absence or presence of any edge).
In the case of a Bayesian network, nodes which have been assigned a
zero probability of having parents (a column of zero in the
hyper_pB
matrix) are understood to be special “control” nodes
for which the likelihood will not be computed. Such nodes will
typically correspond to experimental design variables. Their
likelihood is not computed. They condition the likelihood of their
eventual children node and then take the values assigned to them in
the input file (in which case the “data” are rather forcing values
than actual observations).
bPriorConcordance
, concordance_prior
The flag bPriorConcordance
set to 1 or true
indicates
that a concordance prior should be used (in addition to the baseline
Bernoulli prior on individual edges). By default it is false
. A
concordance prior is an unnormalized score of the edge-wise
difference between a reference adjacency matrix and the matrix being
examined (see below edge_requirements
).
edge_requirements
The matrix edge_requirements
is a square matrix of dimension
nNodes
which specifies the concordance between the edges of a
reference adjacency matrix and the current one. Each element
[i,j] of edge_requirements
can take a value of 1, -1, or
0.
lambda_concordance
(see below).
Example:
bPriorConcordance = true; edge_requirements = matrix {-1, -1, 0, 1, -1, -1, 1, -1, -1};
By default, all elements of edge_requirements
will be set to 0
when bBN
is false
. If bBN
is true
the
diagonal elements will be set to -1 and the others to 0. If you want
it to work with Bayesian networks, you should set bBN
before
defining edge_requirements
, because its default value is
false
.
lambda_concord
, lambda_concordance
The parameter lambda_concord
is used to weight the differences
between the reference adjacency matrix and the current adjacency
matrix when bPriorConcordance
is true
. It should be set
to a double (typically superior to zero). Its default value is
1.
bPriorDegreeNode
, degree_prior
The flag degree_prior
set to 1 or true
indicates
that an exponential prior is placed on the distribution of the nodes’
degrees (the number of incoming and outgoing edges for a given node)
(see Bois & Gayraud 2013). It comes in
addition to the baseline Bernoulli prior on individual edges. By
default it is false
.
gamma_degree
If bPriorDegreeNode
is true
, gamma_degree
specifies the parameter of the exponential prior on degree counts. It
should be set to a double (typically superior to zero). Its default
value is 1.
bPriorMotif
, motif_prior
The flag bPriorMotif
set to 1 or true
indicates that a
beta-binomial prior is placed on the count of triangular feed-forward
and feedback loops in the network (see Bois & Gayraud 2013). It comes in addition to the baseline Bernoulli
prior on individual edges and is incompatible with Bayesian networks
(an error message will be issued). By default it is
false
.
alpha_motif
If bPriorMotif
is true
, alpha_motif
specifies the
first parameter of the beta-binomial prior on loops’ counts. It should
be set to an integer superior to zero. Its default value is 1.
beta_motif
If bPriorMotif
is true
, beta_motif
specifies the
first parameter of the beta-binomial prior on loops’ counts. It should
be set to an integer superior to zero. Its default value is 1.
nData
, n_data
If bBN
is true
, data can be input to infer on the
probabiliy of the presence of edges on the basis of priors and data
likelihood, in a fully Bayesian framework. The predefined variable
nData
should be set to an integer equal to the number of data
points per node. Its default is zero. If no data are provided while
bBN
is true
, simulations will proceed simply on the
basis of priors distributions.
data
After nNodes
, nData
and bBN
have been defined, a
data matrix can also be defined (actually if nData
is different
from zero it must be defined). nData
has no default value. It
should have nNodes rows and each row should be a vector of nData
values (of format doubles). Example:
nNodes = 3; bBN = true; nData = 4; data = matrix {1.1, 1.3, 1.4, 1.35, 2.1, 2.4, 2.5, 2.45, 3.4, 3.6, 3.8, 3.85};
bDirichlet
, dirichlet_score
The data likelihood is by default normal given a vague normal-gamma
prior for the regression parameters. An alternative, for discrete
data, is to use a Dirichlet-multinomial model (See Laskey and Myers 2003, Heckerman et al. 1994, Heckerman et
al. 1995) . To that effect you just need to set
dirichlet_score
to true
(its default value is
false
). In that case the data have to be coded by integers from
zero to n. The number of levels for each node has to be specified
using an n_data_levels
declaration. The
Dirichlet hyper-parameters are internally set to one, specify a
uniform prior on configurations of parents for any node.
n_data_levels
, nData_levels
If a Dirichlet-multinomial model is used, discrete data have to be
specified for each node. Such data have to be coded as integers from
zero to n, n being the number of levels for a given
node. Those levels are specified using the n_data_levels
array declaration.
n_data_levels = array{2, 2, 3, 2, 4};
bZellner
, zellner_score
An alternative to the default data model is to use Zellner’s score. To
that effect you just need to set zellner_score
to true
(its
default value is false
). The drawback is that any node cannot
have more parents than it has data about it (arguably, that’s an
artificial constraint).
nRuns
, n_runs
The total number of iterations to be performed by the MCMC sampler is
specified by setting nRuns
to a long integer value. Its default
value is 1000000000 (yes, a billion).
nBurnin
, n_burnin
A certain number of “burn-in” iterations can be specified by setting
nBurnin
to a long integer value. In that case the MCMC chain
recording, and computation of summary outputs (such as the edge
probabilities) starts only after nBurnin
iteration. Its default
value is zero. This is typically used to discard the part of the MCMC
chain that is not at equilibrium. However, checking that equilibrium
is attained is best done, in our opinion, by running multiple
independent chains and using Gelman and Rubin R^ diagnostic
(see Gelman & Rubin 1992 and other
relevant statistical literature).
seed
, random_seed
The starting value of the pseudo-random generator seed
can be
explicitly set to any long integer number superior to zero. That
allows repeating exactly the same sequence random numbers. That is
required to generate different chains for the same problem in order to
check the convergence of the MCMC simulations. If it is not set by the
user, seed
has a default value of 314159265.3589793.
bsave_the_chain
, save_chain
The MCMC sampling chain can be saved in binary format to a file (named
results_mcmc.bin) by setting bsave_the_chain
to 1 or
true
. By default, the chain is not saved. Beware, MCMC chains
can be very large, even though the recording format is very compact:
results_mcmc.bin starts with the number of nodes in the graph
(as a binary integer, i.e. a byte), followed by the value of the
adjacency matrix (nNodes by nNodes bytes) at the end of burn-in
period, followed by a one-byte encoding of the difference between
successive adjacency matrices. The difference d between adjacency
matrices (equal to -1 for removing an edge, +1 for adding an edge) and
its location [i,j] are encoded as:
(i + j * nNodes + 1) * d.
No difference is encoded as zero. The results_mcmc.bin file can
be used to recreate the successive adjacency matrices sampled.
nSaved_adjacency
, n_saved_adjacency
The user can request the output of a number n_saved_adjacency
(integer) of randomly generated adjacency matrices. Those matrices are
saved at regularly spaced iterations along the MCMC chain (after the
burn-in period) in the file graph_samples.out in text format,
along with the logarithmes of the prior probability, data likelihood
(if data were specified) and posterior probability. By default
n_saved_adjacency
is zero and no matrices are recorded.
bsave_best_graph
, save_best_graph
By setting bsave_best_graph
to true
, the user can
request the output of the adjacency matrix of the graph having the
highest posterior probablity among all random graphs generated after
the burn-in period. That matrice is saved in the file
best_graph.out in text format, along with the logarithmes of
its prior probability, data likelihood (if data were specified) and
posterior probability. By default bsaved_best_graph
is
false
.
bsave_the_edge_probabilies
, save_the_edge_probabilies
Setting bsave_the_edge_probabilies
to true
, forces the
output of a matrix of the individual edge probabilities in the file
edge_p.out, in text format. By default
bsaved_the_edge_probabilies
is false
.
bsave_the_degree_counts
, save_the_degree_counts
Setting bsave_the_degree_counts
to true
, forces the
output of a count of the nodes’ degrees in the graphs sampled after
the burn-in period to the file degree_count.out, in text
format. By default bsaved_the_degree_counts
is
false
.
bsave_the_motifs_probabilies
, save_the_motifs_probabilies
Setting bsave_the_motifs_probabilies
to true
, forces the
output (to the file motifs_count.out, in text format) of a
count of triangular feed-forward and feedback loops in the graphs
sampled after the burn-in period. By default
bsaved_the_motifs_probabilies
is false
.
The following keywords can be used in Graph_Sampler input files:
false
(or FALSE
) keywordThis keyword is equivalent to zero and can be used when assigning variables.
true
(or TRUE
) keywordThis keyword is equivalent to 1 and can be used when assigning variables.
array
keywordThis keyword is used for vector definition. Example:
n_data_levels = array {2, 2, 1+1};
matrix
keywordThis keyword is used for matrix definition. Example:
data = matrix {1.1, 1.3, 1.4, 1.35, 2.1, 2.4, 2.5, 2.45, 3.4, 3.6, 3.8, 3.85};
empty
keywordThis keyword can be used to create an empty initial adjacency matrix
(syntax: matrix{empty};
).
full
keywordThis keyword can be used to create a full initial adjacency matrix
(all elements at 1, except a zeroed diagonal in Bayesian networks)
(syntax: matrix{full};
).
random
keywordThis keyword can be used to create a random initial adjacency matrix
(all elements 0 or 1 at random, except a zeroed diagonal in Bayesian
networks) (syntax: matrix{random};
).
Next: Index, Previous: Running Graph_Sampler, Up: Top [Contents][Index]
Barry T.M. (1996). Recommendations on the testing and use of pseudo-random number generators used in Monte Carlo analysis for risk assessment. Risk Analysis 16:93-105.
Bernardo J.M. and Smith A.F.M. (1994). Bayesian Theory. Wiley, New York.
Bois F. and Gayraud G. (2013). Probabilistic generation of random networks taking into account information on motifs occurrence, arXiv:1311.6443 [q-bio.QM].
Gelman A. and Rubin D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science 7:457-511.
Heckerman et al. (1994). in Proceedings of Tenth Conference on Uncertainty in Artificial Intelligence, Seattle, WA, p. 293-301. Morgan Kanfmann.
Heckerman et al. (1995). Machine Learning, 20, 197-243.
Laskey and Myers (2003). Machine Learning, 50:175-196.
Previous: Bibliographic References, Up: Top [Contents][Index]
Jump to: | A B C D E F G H I L M N O P R S T Z |
---|
Jump to: | A B C D E F G H I L M N O P R S T Z |
---|