|
Previously, string vectors were built by reading input line-by line, and
multiple copies of string vectors were made when searching.
Now, input is read into one big buffer, and string vectors only contain
references to the strings in this buffer. This both speeds up reading of
input, and avoids unnecessary copying of strings in various places.
The main downside currently is that input read from stdin is no longer
UTF-8 normalised. This means, for example, that a search for `e` won't
necessarily match `é`. Normalisation is very slow relative to the rest
of tofi, however, and not needed for most use-cases. This could either
be solved by accepting the slowdown, or making this an option, such as
--unicode or --unicode-normalize.
|