Priors
The priors pickle file, priors_*.pkl
is loaded as a dictionary with two sub-dictionaries, metadata
and priors
. The priors that are stored in the priors
sub-dictionary are built using the equation-tree package
. We will give brief definitions of the information stored in this sub-dictionary, but for a detailed look see the equation-tree
package documentation. It is important to note that the equation-tree
package uses binary expression trees where operators (e.g., +
) have two inputs and functions (e.g., sin
) have one input. For example, the expression m*x+b*sin(y)
could be represented as:
+
/ \
*
*
/ \ / \
m
x
b
sin
|
y
The full structure of the pickle file looks like this:
es_priors
│
└───metadata
│ │ number_of_equations: The number of parsed equations
│ │ unparsed_equations: The number of equations that failed to parse
│ │ list_of_operators: The list of operators considered when parsing equations and building piors
│ │ list_of_functions: The list of functions considered when parsing equations and building piors
│ │ list_of_constants: The list of words/symbols representing constants when parsing equations and building piors
│ │ list_of_equations: The list of each parsed equation
│
└───priors
│ max_depth: The frequency of number of nodes (operators, functions, constants, & variables) in the expression tree
│ depth: The frequency of node layers in the expression tree
│ structures: The list of expression tree structures across equations
│ features: The number of constants and variables across equations*
│ functions: Frequency count of each function across equations
│ operators: Frequency count of each operator across equations
│ operator_and_functions: Frequency count of each function and operator across equations
│ function_conditionals: Frequency count of conditional functions across equations
│ operator_conditionals: Frequency count of conditional operators across equations
*Note that the constant and variable counts are difficult to extract when scraping Wikipedia and so these values are likely incorrect - use with caution