Wiki Organization

Posted on October 23, 2024 by steve

I’ve been writing a lot of technical documentation the last few years. I want to share what I’ve learned about organizing a dev department wiki and the way I’ve found that works well.

Annotated Link

The key to organization and making it easier to find information is a good annotated link.

An annotated link is a link to a wiki page and a short sentence that tells what the page is about. For example:

Wiki Style Guide — how to write a wiki page and keep the wiki organized.

The annotated link starts with an asterisk (so it is a bullet point) followed by two dashes then the short sentence that tells what it is about.

You use this annotated link wherever you reference the page.

One benefit I’ve found is the process of writing a good annotation you end up writing a better wiki page because you have a better idea what it is about. You should be able to use the annotation sentence as the topic sentence of the page.

In the annotated link you use related terms and keywords so it is easier to find it by keyword searching.

Index

The next element for finding information is an index. The index contains all the annotated links in one big list. To find information in the index you use the browser text search.

As you write a new page, you add its annotated link to the bottom of the index. This gives you a relative time line history of when the pages were written. You look at the bottom to find new information.

Don’t alphabetize the list. Unlike a book index you can instantly find using text search.

One big list wouldn’t scale to wikipedia but it works well for a dev wiki. I’m using one with about a thousand links.

The final element is the table of contents. Again you use the annotated links to build the table.

The hierarchy of the table is two levels first main pages then topic pages. The table of contents is an alphabetized list of main pages.

Here is an outline of the table of contents:

Table of Contents
  * topic a — main page for topic a
  * topic b — main page for topic b
  * topic c — main page for topic c
  …

Main Page

A “main” page contains a list of annotated links of related pages of a particular topic. You alphabetize these links. Each link points to a topic page.

A main page ends with a See Also section. It points back to the table of contents.

Below is the outline of the Wiki Information main page. It has annotated links to a few topic pages including the Wiki Style Guide and ending with a see also section.

Wiki Information
  * Topic Page A — how to do topic a.
  * Topic Page B — how to do topic b.
  * Topic Page C — how to do topic c.
  * Wiki Style Guide — how to write a wiki page and keep the wiki organized.
  * Topic Page Z — how to do topic z.
  = See Also =
    * Table of Contents — the table of contents for this wiki.

Topic Page

A topic page contains the meat of the wiki. It tells you how to do something or explains a topic.

A topic page ends with a “See Also” section. The first link of the section contains a link back to its main page. If the page is in multiple main pages, it has multiple links back. This is important for navigation. Once you find one page of a topic, you can find all related pages through these links.

Say you have a Wiki Style Guide topic page. The topic page outline would look like the following:

Wiki Style Guide
  = Summary =
  = Section 1 =
  = Section 2 =
  = Section 3 =
  = See Also =
    * Wiki Information — main page for topics about the using the wiki and writing good wiki pages.

Line Reader

Posted on October 22, 2024 by steve

For statictea I wrote a module to read lines from a file because nim’s high level functions do not support statictea’s requirements.

Nim’s builtin line reader (lines) does not return line endings and it doesn’t care about line lengths.

Statictea requires that:

it preserves template line endings
it uses a small fixed amount of memory
its commands have a line limit
it reports the template name and current line number for warnings
it processes a template sequentially in one pass.

The reader is implemented in the linebuffer module. It consists of the LineBuffer object with the readline method.

The LineBuffer object holds the following information.

the stream
the current position in the stream
a fixed size buffer for lines
the current line number
the name of the stream

The readline method returns the next line from the stream. A line is returned when the line ending is found (lf or crlf), when the stream runs out of bytes or when the maximum line length is reached. When no more data exists in the stream, an empty string is returned.

You can see the source here:

linebuffer.nim

Example

To read lines you create a line buffer then you call its readline method, see the example below. There might not be enough memory for the buffer, so you need to check that it got created.

let lbO = newLineBuffer(templateStream,  filename=templateFilename)
if not lbO.isSome():
  # Not enough memory for the line buffer.
  return
var lb = lbO.get()
while line = lb.readLine():
  processLine(line)

Testing

To make testing easier the unit tests lower the line length and buffer length when it creates a LineBuffer.

You can see the tests here:

test_linebuffer.nim

The line buffer module doesn’t have any dependencies so you could copy it into your nim project if you have similar requirements.

Unicode Evolution

Posted on October 21, 2024 by steve

The last post about the UTF-8 decoder got me thinking about how unicode has changed and improved over time.

Originally it was a two byte solution which is a logical step up from one byte ascii. The unicode team was saying two bytes was enough for all. This was a great solution since all the existing different encodings, single byte and multi-byte, could be replaced by unicode.

The early adopters, like Microsoft, embraced this and spent a lot of effort to adopt and promote it. They split their text interfaces in two, one for ascii and one for unicode.

Microsoft: working-with-strings

One of the early issues to overcome was storing a unicode string natively on different processors which store words (two bytes units) differently. The Mac native order was opposite of the pc. The byte order character BOM was introduced so you could detect and read in the stored string correctly. (Note: Since UTF-8 is a byte stream it doesn’t need BOM.)

Byte Order Mark

Once the unicode team decided that two bytes were not enough for their needs, the surrogate pair workaround was introduced to support more characters and be backward compatible. (Note: UTF-8 doesn’t use surrogate pairs.)

Unicode: utf-16 surrogate pairs — see section 7.5. UTF-16 surrogate pairs

Microsoft: Surrogates and supplementary characters

It wasn’t until later that UTF-8 encoding was invented. What a good idea that was. The web world has adopted it as well as most new applications.

UTF-8 has the advantage that you can treat it as a byte string in many cases. Most existing code doesn’t need to change. Only when you need a text unicode feature do you need to decode the characters.

Python

In Python 2 the basic string type was used for text strings and byte strings.

When they updated to 3.0, they decided to separate text strings from byte strings and treat all text strings as unicode.

“Python 3.0 mostly uses UTF-8 everywhere, but it was not a deliberate choice and it caused many issues when the locale encoding was not UTF-8.”

They were of the opinion that being able to loop over the code points and indexing similar to ascii is important. But they didn’t commit. There was a build flag for how strings are stored wide 32 bit or narrow 16 bit.

32 bit characters work since unicode was guaranteeing not to exceed 10FFFF code points. But 32 bits waste a lot of space for most text strings so they developed ways to compress and store smaller strings.

Both James and Victor explain well how python unicode support works and has evolved.

painful-history-python-filesystem-encoding.html — Victor Stin

how python does unicode — James Bennett

Rust

Rust is a newer language and its text strings are UTF-8. It insures that text strings are always correctly encoded.

Nim

Nim strings are also UTF-8 encoded however they allow invalid encoded strings. I think this is the sweet spot. You have performance since you don’t have to validate as much and more importantly you can use strings as byte strings as well as text strings. The bulk of the code is the same for each. When you need unicode features you can validate and pull in the special unicode handling code.

I believe UTF-8 is the One True Encoding.

UTF-8 Decoder

Posted on October 20, 2024 by steve

I wrote the utf-8 decoder for statictea because I found a bug in nim’s unicode validator. I am amazed how many common programs have bugs in their decoders.

I documented what I found in the utf8tests project. You can see the test results and the decoder in it:

utf8tests

For statictea, instead of referencing the decoder from the utf8tests project, I just copied the utf8decoder module into statictea. When the external module is updated, the statictea build process will tell that the module changed and should be updated.

What’s cool about the decoder:

it’s fast and small
it passes all the tests
it returns byte sequences for valid and invalid cases
it can start in the middle, no need to start at the beginning of a string
it is easy to build high level functions with it

The decoder is only a few lines of code and it is table driven.

The decoder self corrects and synchronizes if you start in the middle of a character byte sequence. The first return will be invalid, but the following return sequences will be valid.

The utf8decoder module contains useful functions yieldUtf8Chars, validateUtf8String, sanitizeUtf8 and utf8CharString all built around the decoder.

It’s important that your decoder passes the tests. For example, Microsoft introduced a security issue because their decoder allowed over long characters, e.g. it allowed multiple encodings for the same character. This was exploited by a hacker by using a multi-byte separator slash in a file path to access private files.

Statictea Dependencies Graphs

Posted on October 18, 2024 by steve

Statictea has two types of dependency graphs, the module dependencies and the nim system module dependencies. You can learn a lot about a project from its dependencies.

Below is the link to the statictea’s source code documentation. The module dependency graph is at the top. If you scroll down to the bottom you see the nim system module dependency graph. If you are viewing on a phone, you can pinch zoom to see more detail.

statictea’s source code docs

Module Dependencies

A graph node represents a nim source code module (file). The lines connecting the nodes show the dependencies. For example the parseCommandLine module on the left depends on the cmdline module.

From the module dependency graph you can see:

which modules you can share with other projects
the frequently used modules
the relative module file sizes
the project entry point
the stand alone modules

It’s easier to edit and test modular code because of well defined and simple interfaces.

The green nodes do not depend on any module. These are the modules you can easily share with other projects.

The red dependency line tells you that the module has only one dependency. I try to rework the code to remove the dependency if feasible. The opresult and matches modules are example of this.

If you put all your code in one module, it wouldn’t have any dependencies! However it would be to bloated and not granular enough for sharing with other projects. You can tell the relative size of a module by the size of its node. runCommand is the biggest module.

The comparelines, jsondocraw and stfrunner modules are stand alone commands so they are not connected in this graph.

Nim Module Graph

The nim module graph at the bottom of the page tells you the nim language APIs you need. It shows which nim modules are used by which statictea modules.

The green nim modules are only used by one statictea module. I strive for green modules, that way all related functions are grouped together.

Often I wrap a nim module. The statictea wrapper module exposes a limited and simple interface tailored for statictea needs. This was done for the re module. The graph tells you that statictea only uses the regular expression functions in the regexes module and not the multitude of functions in the re module. The readjson module is another example of this.

Building Graphs

You build the dependency graph with the nimble task dotsrc and the nim module graph with dotsys. Search for createDependencyGraph and createDependencyGraph2 in the nimble file.

Each graph is an svg file. The graphviz program produces the file based on a text file defining the dependencies. The text file is created by the nimble task starting from the data generated by nim’s genDepend command.

* https://graphviz.org/

StfRunner

Posted on October 17, 2024 by steve

The stfRunner command is used for running statictea system tests.

StfRunner executes a stf test file which tests a feature. A stf file contains instructions for creating files, running files and comparing files.

A normal test creates all the input files, runs statictea then compares the outputs to the expected values.

Although stfRunner was written to test statictea, you can use it to test any command line program.

What’s cool about it:

* you specify each test in one file
* a test with its documentation looks good in a markdown reader
* the test file format is simple
* you can test standard out, standard error, the error code as well as files

Below is the hello world statictea readme example. The test creates four files “cmd.sh”, “hello.json”, “hello.html” and “stdout.expected”. It then runs the cmd.sh file which creates two output files: “stdout” and “stderr”. The final steps compare the output files with the expected files.

stf file, version 0.1.0 # Hello World Readme Hello World example. ### File cmd.sh command ~~~ $statictea \ -s hello.json \ -t hello.html \ >stdout 2>stderr ~~~ ### File hello.html ~~~ hello {s.name} ~~~ ### File hello.json ~~~ {"name": "world"} ~~~ ### File stdout.expected ~~~ hello world ~~~ ### Expected stdout == stdout.expected ### Expected stderr == empty

In this example, six files where created. Using some other test method, it would be hard to manage all these files when you have a lot of tests.

Since the test file is a markdown file, you can create easy to read tests and associated embedded documentation and it looks good in a markdown reader. You can see what hello world it looks like viewed with github’s viewer:

* hello.stf.md

Here are all the statictea system tests:

* statictea stf tests

I’m planning to spin out stfRunner into its own standalone github project.

The letters STF stand for “Single Test File” or my name STeve Flenniken.

I’m considering changing its name to tea-runner. You can think of it as t-runner for test-runner, stf-runner, or maybe rum-runner or beer-run.

Uniform Function Call Syntax

Posted on October 15, 2024 by steve

The Uniform Function Call Syntax generalizes calling a function and calling a member function.

This relatively small feature has a big impact how you think about objects how you reference code.

You can call a procedure two equivalent ways, by passing all the arguments to it or by treating the first argument like you would for an object when calling its methods (dot notation). For example calling procName which take three parameters can be done like:

procName(a, b, c) or a.procName(b, c)

You can omit the parentheses when they are empty. For example when no parameters:

procName2() or procName2
Or when there is one parameter:

procName3(a) or a.procName3() or a.procName3

The first parameter can be any type, it is not restricted to objects and you can use variables or literals.

For an add procedure with two integer parameters you can call it shown below. The last line shows chaining by adding 1 + 2 + 3.

v1 = add(1, 2) v2 = 1.add(2) v3 = 1.add(2).add(3)

You can extend an object defined in another module by defining a procedure with the object as the first parameter.

newMethodName(obj, a, b, c)

And you call it like the other methods of the object:

obj.newMethodName(a, b, c)

When you omit the parentheses it appears like you are accessing a member variable. These are called object getters in some languages. For example:

“str”.len

The Nim tutorial talks about how to do “setters”.

* https://nim-lang.org/docs/tut2.html

See the wikipeadia page for more information.

* https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax

Steve Flenniken

Software Blog

Monthly Archives: October 2024