nim | Steve Flenniken

For statictea I wrote a module to read lines from a file because nim’s high level functions do not support statictea’s requirements.

Nim’s builtin line reader (lines) does not return line endings and it doesn’t care about line lengths.

Statictea requires that:

it preserves template line endings
it uses a small fixed amount of memory
its commands have a line limit
it reports the template name and current line number for warnings
it processes a template sequentially in one pass.

The reader is implemented in the linebuffer module. It consists of the LineBuffer object with the readline method.

The LineBuffer object holds the following information.

the stream
the current position in the stream
a fixed size buffer for lines
the current line number
the name of the stream

The readline method returns the next line from the stream. A line is returned when the line ending is found (lf or crlf), when the stream runs out of bytes or when the maximum line length is reached. When no more data exists in the stream, an empty string is returned.

You can see the source here:

linebuffer.nim

Example

To read lines you create a line buffer then you call its readline method, see the example below. There might not be enough memory for the buffer, so you need to check that it got created.

let lbO = newLineBuffer(templateStream,  filename=templateFilename)
if not lbO.isSome():
  # Not enough memory for the line buffer.
  return
var lb = lbO.get()
while line = lb.readLine():
  processLine(line)

Testing

To make testing easier the unit tests lower the line length and buffer length when it creates a LineBuffer.

You can see the tests here:

test_linebuffer.nim

The line buffer module doesn’t have any dependencies so you could copy it into your nim project if you have similar requirements.

I wrote the utf-8 decoder for statictea because I found a bug in nim’s unicode validator. I am amazed how many common programs have bugs in their decoders.

I documented what I found in the utf8tests project. You can see the test results and the decoder in it:

utf8tests

For statictea, instead of referencing the decoder from the utf8tests project, I just copied the utf8decoder module into statictea. When the external module is updated, the statictea build process will tell that the module changed and should be updated.

What’s cool about the decoder:

it’s fast and small
it passes all the tests
it returns byte sequences for valid and invalid cases
it can start in the middle, no need to start at the beginning of a string
it is easy to build high level functions with it

The decoder is only a few lines of code and it is table driven.

The decoder self corrects and synchronizes if you start in the middle of a character byte sequence. The first return will be invalid, but the following return sequences will be valid.

The utf8decoder module contains useful functions yieldUtf8Chars, validateUtf8String, sanitizeUtf8 and utf8CharString all built around the decoder.

It’s important that your decoder passes the tests. For example, Microsoft introduced a security issue because their decoder allowed over long characters, e.g. it allowed multiple encodings for the same character. This was exploited by a hacker by using a multi-byte separator slash in a file path to access private files.

Steve Flenniken

Software Blog

Tag Archives: nim

Line Reader

Example

Testing

UTF-8 Decoder