Markovian models with hidden state are widely-used formalisms for modeling sequential phenomena. Learnability of these models has been well studied when the sample is given in batch mode, and algorithms with PAC-like learning guarantees exist for specic classes of models such as Probabilistic Deterministic Finite Automata (PDFA). Here we focus on PDFA and give an algorithm for infering models in this class under the stringent data stream scenario: unlike existing methods, our algorithm works incrementally and in one pass, uses memory sublinear in the stream length, and processes input items in amortized constant time. We provide rigorous PAC-like bounds for all of the above, as well as an
evaluation on synthetic data showing that the algorithm performs well in practice. Our
algorithm makes a key usage of several old and new sketching techniques. In particular, we develop a new sketch for implementing bootstrapping in a streaming setting which may be of independent interest. In experiments we have observed that this sketch yields important reductions in the examples required for performing some crucial statistical tests in our algorithm.
Best Student Paper ICGI 2012
CitationB. Balle; Castro, J.; Gavaldà, R. Bootstrapping and learning PDFA in data streams. A: International Colloquim on Grammatical Inference. "Proceedings of the Eleventh International Conference on Grammatical Inference". Washington: 2012, p. 34-48.
All rights reserved. This work is protected by the corresponding intellectual and industrial property rights. Without prejudice to any existing legal exemptions, reproduction, distribution, public communication or transformation of this work are prohibited without permission of the copyright holder. If you wish to make any use of the work not provided for in the law, please contact: email@example.com