pyfoma package

Submodules

pyfoma.algorithms module

pyfoma.cfg module

Context-Free Grammar tools

pyfoma.cfg.draw_cfg(string: str, style='tree')[source]: Draws a CFG string as a tree with graphviz. :param string: The CFG string to render :param style: The style to render the CFG in. Choose from ‘tree’, ‘boxes’. :return: A graphviz.Graph object.

pyfoma.eliminate_flags module

pyfoma.flag module

class pyfoma.flag.FlagFilter(alphabet: Set[str])[source]

Bases: object

__init__(alphabet: Set[str])[source]

Create FlagFilter from an FST alphabet

Parameters:: alphabet – A symbol set (containing strings)

class pyfoma.flag.FlagOp(sym: str)[source]

Bases: object

__init__(sym: str)[source]

Creates a Flag diacritic

Parameters:: sym – String representation of flag diacritic

The parameter ‘sym’ should follow the format [[XYZ]], for example “[[$Num=Sg]]”, where:

X is a variable name matching the regex “[$]w+”

2. Y is one of the operators “=” (set value), “==” (check that value equals), “!=” (check that value does not equal) or “$=” (unify to value) 3. Z is a value matching the regex “[$]?w+”. If the value starts with $, then it refers to a variable.

More formally, any flag diacritic needs to match the expression flag.FLAGRE.

check(config: Dict[str, str], val: str) → bool[source]: The operator “==”

static filter_flags(seq: Sequence[str]) → Sequence[str][source]: Filter out flag diacritics from symbol sequence

static is_flag(sym: str) → bool[source]

Check that ‘sym’ matches the format required by FlagOp.__init__

Parameters:: sym – A string
Returns:: True is ‘sym’ is a valid flag diacritic. False,

otherwise.

neg_check(config: Dict[str, str], val: str) → bool[source]: The operator “!=”

setv(config: Dict[str, str], val: str) → bool[source]: The operator “=”

unify(config: Dict[str, str], val: str) → bool[source]: The operator “?=”

class pyfoma.flag.FlagStreamFilter(alphabet: Set[str])[source]

Bases: FlagFilter

__init__(alphabet: Set[str])[source]

Create FlagStreamFilter from an FST alphabet

Parameters:: alphabet – A symbol set (containing strings)

check(sym: str) → bool[source]

Read next symbol and check fla diacritic configuration

Parameters:: sym – A symbol in self.alphabet
Returns:: True if the combination of flag diactritics upto this

point is valid. False, otherwise.

Raises KeyError when ‘sym’ is missing from the FST alphabet.

reset()[source]: Reset all variables to empty value “{}”

class pyfoma.flag.FlagStringFilter(alphabet: Set[str])[source]: Bases: FlagFilter

class pyfoma.flag.TestFlagFilter(methodName='runTest')[source]

Bases: TestCase

test_init()[source]

test_stream()[source]

test_strings_neg()[source]

test_strings_pos()[source]

class pyfoma.flag.TestFlagOp(methodName='runTest')[source]

Bases: TestCase

test_call()[source]

test_init()[source]

test_init_non_flag()[source]

test_is_flag()[source]

pyfoma.flag.time_flag_execution(trials)[source]

pyfoma.fst module

class pyfoma.fst.FST(label: Tuple | None = None, weight=0.0, alphabet={})[source]

Bases: object

__init__(label: Tuple | None = None, weight=0.0, alphabet={})[source]

Creates an FST-structure with a single state.

Parameters:

label – create a two-state FST that accepts label
weight – add a weight to the final state
alphabet – declare an alphabet explicitly

If ‘label’ is given, a two-state automaton is created with label as the only transition from the initial state to the final state.

If ‘weight’ is also given, the final state will have that weight. Labels are always tuples internally, so a two-state automaton that only accepts ‘a’ should have label = (‘a’,).

If label is the empty string, i.e. (‘’,), the second state will not be created, but the initial state will be made final with weight ‘weight’.

add_weight(weight) → FST[source]: Add weight to the set of final states in the FST.

alphabet: The alphabet used by the FST

analyze(word, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]: Pass word through FST and return generator that yields all inputs.

apply(word, inverse=False, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]: Pass word through FST and return generator that yields outputs. if inverse == True, map from range to domain. weights is by default False. To see the cost, set weights to True. obey_flags toggles whether invalid flag diacritic combinations are filtered out. By default, flags are treated as epsilons in the input. print_flags toggels whether flag diacritics are printed in the output.

become(other: FST)[source]: Hacky or what? We use this to mutate self for those algorithms that don’t directly do it.

classmethod character_ranges(ranges, complement=False) → FST[source]: Returns a two-state FSM from a list of unicode code point range pairs. Keyword arguments: complement – if True, the character class is negated, i.e. [^ … ], and a two-state FST is returned with the single label . and all the symbols in the character class are put in the alphabet.

cleanup_sigma() → FST[source]: Remove symbols if they are no longer needed, including . . Returns a new FST with the cleaned alphabet.

complement() → FST[source]: Returns the complement of an FST.

compose(fst2: FST) → FST[source]: Composition of A,B; will expand an acceptor into 2-tape FST on-the-fly.

concatenate(fst2: FST) → FST[source]: Concatenation of T1T2. No epsilons. May produce non-accessible states.

context_restrict(*contexts, rewrite=False) → FST[source]: Only allow self in the context L1 _ R1, or … , or L_n _ R_n.

copy_filtered(labelfilter=<function FST.<lambda>>)[source]: Create a copy of self, possibly filtering out labels where them optional function ‘labelfilter’ returns False.

copy_mod(modlabel=<function FST.<lambda>>, modweight=<function FST.<lambda>>)[source]: Copies an FSM and possibly modifies labels and weights through functions. Keyword arguments: modlabel – a function that modifies the label, takes label, weight as args. modweights – a function that modifies the weight, takes label, weight as args.

cross_product(fst2: FST, optional: bool = False) → FST[source]: Perform the cross-product of T1, T2 through composition. Keyword arguments: optional – if True, calculates T1:T2 | T1.

determinize(staterep=<function FST.<lambda>>, oplus=<built-in function min>) → FST[source]: Weighted determinization of FST.

determinize_as_dfa() → FST[source]: Determinize as a DFA with weight as part of label, then apply unweighted det.

determinize_unweighted() → FST[source]: Determinize with all zero weights.

difference(fst2: FST) → FST[source]: Returns self-other. Uses the product algorithm.

eliminate_flags() → FST[source]: Equivalent behavior but no flag diacritics.

epsilon_remove() → FST[source]: Create new epsilon-free FSM equivalent to original.

filter_accessible() → FST[source]: Remove states that are not on a path from the initial state.

filter_coaccessible() → FST[source]: Remove states and transitions to states that have no path to a final state.

finalstates: A set of all final (accepting) states of the FST

classmethod from_strings(strings, multichar_symbols=None)[source]: Create an automaton that accepts words in the iterable ‘strings’.

generate(word, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]: Pass word through FST and return generator that yields all outputs.

ignore(fst2: FST) → FST[source]: A, ignoring intervening instances of B.

initialstate: The initial (start) state of the FST

intersection(fst2: FST) → FST[source]: Intersection of self and other. Uses the product algorithm.

invert() → FST[source]: Calculate the inverse of a transducer, i.e. flips label tuples around.

kleene_closure(mode='star') → FST[source]: Apply self*. No epsilons here. If mode == ‘plus’, calculate self+.

kleene_plus() → FST[source]: Apply self+.

kleene_star() → FST[source]: Apply self*.

label_states_topology(mode='BFS') → FST[source]: Topologically sort and label states with numbers. Keyword arguments: mode – ‘BFS’, i.e. breadth-first search by default. ‘DFS’ is depth-first.

classmethod load(path: str) → FST[source]: Loads an FST from a .fst file. :param path: The path to load from. Must be a .fst file :type path: str

map_labels(map: dict) → FST[source]

Relabel the transducer with new labels from dictionary mapping.

Example: fst.map_labels({‘a’:’’, ‘b’:’a’})

merge_equivalent_states(equivalenceclasses: set) → FST[source]: Merge equivalent states given as a set of sets.

minimize() → FST[source]: Minimize by constrained reverse subset construction, Hopcroft-ish.

minimize_as_dfa() → FST[source]: Minimize as a DFA with weight as part of label, then apply unweighted min.

minimize_brz() → FST[source]: Minimize through Brzozowski’s trick.

number_unnamed_states(force=False) → dict[source]: Sequentially number those states that don’t have the ‘name’ attribute. If ‘force == True’, number all states.

optional() → FST[source]: Calculate T|’’ .

product(fst2: ~pyfoma.fst.FST, finalf=<built-in function any>, oplus=<built-in function min>, pathfollow=<function FST.<lambda>>) → FST[source]: Generates the product FST from fst1, fst2. The helper functions by default produce fst1|fst2.

project(dim=0) → FST[source]: Project fst. dim = -1 will get output proj regardless of # of tapes.

push_weights() → FST[source]: Push weights toward the initial state. Calls dijkstra and maybe scc.

classmethod re(regularexpression, defined={}, functions={}, multichar_symbols=None)

Compile a regular expression and return the resulting FST. Keyword arguments: defined – a dictionary of defined FSTs that the compiler can access whenever

a defined network is referenced in the regex, e.g. $vowel

functions – a set of Python functions that the compiler can access when a function: is referenced in the regex, e.g. $^myfunc(…)

classmethod regex(regularexpression, defined={}, functions={}, multichar_symbols=None)[source]

Compile a regular expression and return the resulting FST. Keyword arguments: defined – a dictionary of defined FSTs that the compiler can access whenever

a defined network is referenced in the regex, e.g. $vowel

functions – a set of Python functions that the compiler can access when a function: is referenced in the regex, e.g. $^myfunc(…)

render(view=True, filename: str = 'FST', format='pdf', tight=True)[source]: Renders the FST to a file and optionally opens the file. :param view: If True, the rendered file will be opened. :param format: The file format for the Digraph. Typically ‘pdf’, ‘png’, or ‘svg’. View all formats: https://graphviz.org/docs/outputs/ :param tight: If False, the rendered file will have whitespace margins around the graph.

reverse() → FST[source]: Reverse the FST, epsilon-free.

reverse_e() → FST[source]: Reverse the FST, using epsilons.

rewrite(*contexts, **flags) → FST[source]: Rewrite self in contexts in parallel, controlled by flags.

classmethod rlg(grammar, startsymbol, multichar_symbols=None)[source]: Compile a (weighted) right-linear grammar into an FST, similarly to lexc.

save(path: str)[source]: Saves the current FST to a file. :param path: The path to save to (without a file extension) :type path: str

save_att(base: PathLike, state_symbols=False, epsilon='@0@')[source]

Save to AT&T format files for use with other FST libraries (Foma, OpenFST, RustFST, HFST, etc).

This will, in addition to saving the transitions in base, also create separate files with the extensions .isyms and .osyms containing the input and output symbol tables (so for example if base is test.fst, it will create test.isyms and test.osyms)

Note also that the AT&T format has no mechanism for quoting or escaping characters (notably whitespace) in symbols and state names, but only tabs are used as field separators by default, so any other characters should be acceptable (though not always recommended). The symbol @0@ is used by default for epsilon (but can be changed with the epsilon parameter) as this is Foma’s default, and will always have symbol ID 0 as this is required by OpenFST.

If state_symbols is true, the names of states will be retained in the output file and a state symbol table created with the extension .ssyms. This option is disabled by default since it is not compatible with Foma.

Note also that unreachable states are not inclued in the output.

states: A set of all states in the FST

todict() → Dict[str, Any][source]: Create a dictionary form of the FST for export to JSON. May be post-processed for optimization in Javascript.

tojs(jsnetname: str = 'myNet') → str[source]: Create Javascript compatible with foma_apply_down.js

tokenize_against_alphabet(word) → list[source]: Tokenize a string using the alphabet of the automaton.

trim() → FST[source]: Remove states that aren’t both accessible and coaccessible.

union(fst2: FST) → FST[source]: Epsilon-free calculation of union of self and fst2.

view(raw=False, show_weights=False, show_alphabet=True) → graphviz.Digraph[source]

Creates a ‘graphviz.Digraph’ object to view the FST. Will automatically display the FST in Jupyter.

param raw:

if True, show label tuples and weights unformatted

param show_weights:

force display of weights even if 0.0

param show_alphabet:

displays the alphabet below the FST

return:

A Digraph object which will automatically display in Jupyter.

If you would like to display the FST from a non-Jupyter environment, please use FST.render

words()[source]: A generator to yield all words. Yay BFS!

words_cheapest()[source]: A generator to yield all words in order of cost, cheapest first.

words_nbest(n) → list[source]: Finds the n cheapest word in an FST, returning a list.

pyfoma.fst.concatenate(fst1: FST, fst2: FST)[source]

pyfoma.fst.context_restrict(fst: FST, *contexts, rewrite=False)[source]

pyfoma.fst.determinize(fst: FST)[source]

pyfoma.fst.harmonize_alphabet(func)[source]: A wrapper for expanding .-symbols when operations of arity 2 are performed. For example, if calculating the union of FSM1 and FSM2, and both contain .-symbols, the transitions with . are expanded to include the symbols that are present in the other FST.

pyfoma.fst.ignore(fst1: FST, fst2: FST)[source]

pyfoma.fst.invert(fst: FST)[source]

pyfoma.fst.minimize(fst: FST)[source]

pyfoma.fst.project(fst: FST, dim=0)[source]

pyfoma.fst.reverse(fst: FST)[source]

pyfoma.fst.rewrite(fst: FST, *contexts, **flags)[source]

pyfoma.paradigm module

class pyfoma.paradigm.Paradigm(grammar, regexfilter, tagfilter=<function Paradigm.<lambda>>, obey_flags=True, print_flags=False)[source]

Bases: object

__init__(grammar, regexfilter, tagfilter=<function Paradigm.<lambda>>, obey_flags=True, print_flags=False)[source]

Extract a ‘paradigm’ from a grammar FST. Available as a list in attr para. regexfilter – a regex which is composed on the input side to filter out

a specific lexeme or set of lexemes, e.g. ‘run.*’

Keyword arguments: tagfilter – a function to identify tags, by default bracketed symbols [ … ] obey_flags – whether to exlcude input-output pairs with invalid flag diacritic combinations print_flags – whether to print flag diacritics in output

pyfoma package

Submodules

pyfoma.algorithms module

pyfoma.cfg module

pyfoma.eliminate_flags module

pyfoma.flag module

pyfoma.fst module

pyfoma.paradigm module

Module contents