pyfoma package
Submodules
pyfoma.algorithms module
pyfoma.cfg module
Context-Free Grammar tools
pyfoma.eliminate_flags module
pyfoma.flag module
- class pyfoma.flag.FlagOp(sym: str)[source]
Bases:
object
- __init__(sym: str)[source]
Creates a Flag diacritic
- Parameters:
sym – String representation of flag diacritic
The parameter ‘sym’ should follow the format [[XYZ]], for example “[[$Num=Sg]]”, where:
X is a variable name matching the regex “[$]w+”
2. Y is one of the operators “=” (set value), “==” (check that value equals), “!=” (check that value does not equal) or “$=” (unify to value) 3. Z is a value matching the regex “[$]?w+”. If the value starts with $, then it refers to a variable.
More formally, any flag diacritic needs to match the expression flag.FLAGRE.
- static filter_flags(seq: Sequence[str]) Sequence[str] [source]
Filter out flag diacritics from symbol sequence
- class pyfoma.flag.FlagStreamFilter(alphabet: Set[str])[source]
Bases:
FlagFilter
- __init__(alphabet: Set[str])[source]
Create FlagStreamFilter from an FST alphabet
- Parameters:
alphabet – A symbol set (containing strings)
- class pyfoma.flag.FlagStringFilter(alphabet: Set[str])[source]
Bases:
FlagFilter
pyfoma.fst module
- class pyfoma.fst.FST(label: Tuple | None = None, weight=0.0, alphabet={})[source]
Bases:
object
- __init__(label: Tuple | None = None, weight=0.0, alphabet={})[source]
Creates an FST-structure with a single state.
- Parameters:
label – create a two-state FST that accepts label
weight – add a weight to the final state
alphabet – declare an alphabet explicitly
If ‘label’ is given, a two-state automaton is created with label as the only transition from the initial state to the final state.
If ‘weight’ is also given, the final state will have that weight. Labels are always tuples internally, so a two-state automaton that only accepts ‘a’ should have label = (‘a’,).
If label is the empty string, i.e. (‘’,), the second state will not be created, but the initial state will be made final with weight ‘weight’.
- alphabet
The alphabet used by the FST
- analyze(word, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]
Pass word through FST and return generator that yields all inputs.
- apply(word, inverse=False, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]
Pass word through FST and return generator that yields outputs. if inverse == True, map from range to domain. weights is by default False. To see the cost, set weights to True. obey_flags toggles whether invalid flag diacritic combinations are filtered out. By default, flags are treated as epsilons in the input. print_flags toggels whether flag diacritics are printed in the output.
- become(other: FST)[source]
Hacky or what? We use this to mutate self for those algorithms that don’t directly do it.
- classmethod character_ranges(ranges, complement=False) FST [source]
Returns a two-state FSM from a list of unicode code point range pairs. Keyword arguments: complement – if True, the character class is negated, i.e. [^ … ], and a two-state FST is returned with the single label . and all the symbols in the character class are put in the alphabet.
- cleanup_sigma() FST [source]
Remove symbols if they are no longer needed, including . . Returns a new FST with the cleaned alphabet.
- compose(fst2: FST) FST [source]
Composition of A,B; will expand an acceptor into 2-tape FST on-the-fly.
- concatenate(fst2: FST) FST [source]
Concatenation of T1T2. No epsilons. May produce non-accessible states.
- context_restrict(*contexts, rewrite=False) FST [source]
Only allow self in the context L1 _ R1, or … , or L_n _ R_n.
- copy_filtered(labelfilter=<function FST.<lambda>>)[source]
Create a copy of self, possibly filtering out labels where them optional function ‘labelfilter’ returns False.
- copy_mod(modlabel=<function FST.<lambda>>, modweight=<function FST.<lambda>>)[source]
Copies an FSM and possibly modifies labels and weights through functions. Keyword arguments: modlabel – a function that modifies the label, takes label, weight as args. modweights – a function that modifies the weight, takes label, weight as args.
- cross_product(fst2: FST, optional: bool = False) FST [source]
Perform the cross-product of T1, T2 through composition. Keyword arguments: optional – if True, calculates T1:T2 | T1.
- determinize(staterep=<function FST.<lambda>>, oplus=<built-in function min>) FST [source]
Weighted determinization of FST.
- determinize_as_dfa() FST [source]
Determinize as a DFA with weight as part of label, then apply unweighted det.
- filter_coaccessible() FST [source]
Remove states and transitions to states that have no path to a final state.
- finalstates
A set of all final (accepting) states of the FST
- classmethod from_strings(strings, multichar_symbols=None)[source]
Create an automaton that accepts words in the iterable ‘strings’.
- generate(word, weights=False, tokenize_outputs=False, obey_flags=True, print_flags=False)[source]
Pass word through FST and return generator that yields all outputs.
- initialstate
The initial (start) state of the FST
- kleene_closure(mode='star') FST [source]
Apply self*. No epsilons here. If mode == ‘plus’, calculate self+.
- label_states_topology(mode='BFS') FST [source]
Topologically sort and label states with numbers. Keyword arguments: mode – ‘BFS’, i.e. breadth-first search by default. ‘DFS’ is depth-first.
- classmethod load(path: str) FST [source]
Loads an FST from a .fst file. :param path: The path to load from. Must be a .fst file :type path: str
- map_labels(map: dict) FST [source]
Relabel the transducer with new labels from dictionary mapping.
Example: fst.map_labels({‘a’:’’, ‘b’:’a’})
- merge_equivalent_states(equivalenceclasses: set) FST [source]
Merge equivalent states given as a set of sets.
- minimize_as_dfa() FST [source]
Minimize as a DFA with weight as part of label, then apply unweighted min.
- number_unnamed_states(force=False) dict [source]
Sequentially number those states that don’t have the ‘name’ attribute. If ‘force == True’, number all states.
- product(fst2: ~pyfoma.fst.FST, finalf=<built-in function any>, oplus=<built-in function min>, pathfollow=<function FST.<lambda>>) FST [source]
Generates the product FST from fst1, fst2. The helper functions by default produce fst1|fst2.
- classmethod re(regularexpression, defined={}, functions={}, multichar_symbols=None)
Compile a regular expression and return the resulting FST. Keyword arguments: defined – a dictionary of defined FSTs that the compiler can access whenever
a defined network is referenced in the regex, e.g. $vowel
- functions – a set of Python functions that the compiler can access when a function
is referenced in the regex, e.g. $^myfunc(…)
- classmethod regex(regularexpression, defined={}, functions={}, multichar_symbols=None)[source]
Compile a regular expression and return the resulting FST. Keyword arguments: defined – a dictionary of defined FSTs that the compiler can access whenever
a defined network is referenced in the regex, e.g. $vowel
- functions – a set of Python functions that the compiler can access when a function
is referenced in the regex, e.g. $^myfunc(…)
- render(view=True, filename: str = 'FST', format='pdf', tight=True)[source]
Renders the FST to a file and optionally opens the file. :param view: If True, the rendered file will be opened. :param format: The file format for the Digraph. Typically ‘pdf’, ‘png’, or ‘svg’. View all formats: https://graphviz.org/docs/outputs/ :param tight: If False, the rendered file will have whitespace margins around the graph.
- rewrite(*contexts, **flags) FST [source]
Rewrite self in contexts in parallel, controlled by flags.
- classmethod rlg(grammar, startsymbol, multichar_symbols=None)[source]
Compile a (weighted) right-linear grammar into an FST, similarly to lexc.
- save(path: str)[source]
Saves the current FST to a file. :param path: The path to save to (without a file extension) :type path: str
- save_att(base: PathLike, state_symbols=False, epsilon='@0@')[source]
Save to AT&T format files for use with other FST libraries (Foma, OpenFST, RustFST, HFST, etc).
This will, in addition to saving the transitions in base, also create separate files with the extensions .isyms and .osyms containing the input and output symbol tables (so for example if base is test.fst, it will create test.isyms and test.osyms)
Note also that the AT&T format has no mechanism for quoting or escaping characters (notably whitespace) in symbols and state names, but only tabs are used as field separators by default, so any other characters should be acceptable (though not always recommended). The symbol @0@ is used by default for epsilon (but can be changed with the epsilon parameter) as this is Foma’s default, and will always have symbol ID 0 as this is required by OpenFST.
If state_symbols is true, the names of states will be retained in the output file and a state symbol table created with the extension .ssyms. This option is disabled by default since it is not compatible with Foma.
Note also that unreachable states are not inclued in the output.
- states
A set of all states in the FST
- todict() Dict[str, Any] [source]
Create a dictionary form of the FST for export to JSON. May be post-processed for optimization in Javascript.
- tokenize_against_alphabet(word) list [source]
Tokenize a string using the alphabet of the automaton.
- view(raw=False, show_weights=False, show_alphabet=True) graphviz.Digraph [source]
Creates a ‘graphviz.Digraph’ object to view the FST. Will automatically display the FST in Jupyter.
- param raw:
if True, show label tuples and weights unformatted
- param show_weights:
force display of weights even if 0.0
- param show_alphabet:
displays the alphabet below the FST
- return:
A Digraph object which will automatically display in Jupyter.
If you would like to display the FST from a non-Jupyter environment, please use
FST.render
pyfoma.paradigm module
- class pyfoma.paradigm.Paradigm(grammar, regexfilter, tagfilter=<function Paradigm.<lambda>>, obey_flags=True, print_flags=False)[source]
Bases:
object
- __init__(grammar, regexfilter, tagfilter=<function Paradigm.<lambda>>, obey_flags=True, print_flags=False)[source]
Extract a ‘paradigm’ from a grammar FST. Available as a list in attr para. regexfilter – a regex which is composed on the input side to filter out
a specific lexeme or set of lexemes, e.g. ‘run.*’
Keyword arguments: tagfilter – a function to identify tags, by default bracketed symbols [ … ] obey_flags – whether to exlcude input-output pairs with invalid flag diacritic combinations print_flags – whether to print flag diacritics in output