Wiki Organization

I’ve been writing a lot of technical documentation the last few years. I want to share what I’ve learned about organizing a dev department wiki and the way I’ve found that works well.

Annotated Link

The key to organization and making it easier to find information is a good annotated link.

An annotated link is a link to a wiki page and a short sentence that tells what the page is about. For example:

  • Wiki Style Guide — how to write a wiki page and keep the wiki organized.

The annotated link starts with an asterisk (so it is a bullet point) followed by two dashes then the short sentence that tells what it is about.

You use this annotated link wherever you reference the page.

One benefit I’ve found is the process of writing a good annotation you end up writing a better wiki page because you have a better idea what it is about. You should be able to use the annotation sentence as the topic sentence of the page.

In the annotated link you use related terms and keywords so it is easier to find it by keyword searching.

Index

The next element for finding information is an index. The index contains all the annotated links in one big list. To find information in the index you use the browser text search.

As you write a new page, you add its annotated link to the bottom of the index. This gives you a relative time line history of when the pages were written. You look at the bottom to find new information.

Don’t alphabetize the list. Unlike a book index you can instantly find using text search.

One big list wouldn’t scale to wikipedia but it works well for a dev wiki. I’m using one with about a thousand links.

Table of Contents

The final element is the table of contents. Again you use the annotated links to build the table.

The hierarchy of the table is two levels first main pages then topic pages. The table of contents is an alphabetized list of main pages.

Here is an outline of the table of contents:

Table of Contents
  * topic a — main page for topic a
  * topic b — main page for topic b
  * topic c — main page for topic c
  …

Main Page

A “main” page contains a list of annotated links of related pages of a particular topic. You alphabetize these links. Each link points to a topic page.

A main page ends with a See Also section. It points back to the table of contents.

Below is the outline of the Wiki Information main page. It has annotated links to a few topic pages including the Wiki Style Guide and ending with a see also section.

Wiki Information
  * Topic Page A — how to do topic a.
  * Topic Page B — how to do topic b.
  * Topic Page C — how to do topic c.
  * Wiki Style Guide — how to write a wiki page and keep the wiki organized.
  * Topic Page Z — how to do topic z.
  = See Also =
    * Table of Contents — the table of contents for this wiki. 

Topic Page

A topic page contains the meat of the wiki. It tells you how to do something or explains a topic.

A topic page ends with a “See Also” section. The first link of the section contains a link back to its main page. If the page is in multiple main pages, it has multiple links back. This is important for navigation. Once you find one page of a topic, you can find all related pages through these links.

Say you have a Wiki Style Guide topic page. The topic page outline would look like the following:

Wiki Style Guide
  = Summary =
  = Section 1 =
  = Section 2 =
  = Section 3 =
  = See Also =
    * Wiki Information — main page for topics about the using the wiki and writing good wiki pages. 

Line Reader

For statictea I wrote a module to read lines from a file because nim’s high level functions do not support statictea’s requirements.

Nim’s builtin line reader (lines) does not return line endings and it doesn’t care about line lengths.

Statictea requires that:

  • it preserves template line endings
  • it uses a small fixed amount of memory
  • its commands have a line limit
  • it reports the template name and current line number for warnings
  • it processes a template sequentially in one pass.

The reader is implemented in the linebuffer module. It consists of the LineBuffer object with the readline method.

The LineBuffer object holds the following information.

  • the stream
  • the current position in the stream
  • a fixed size buffer for lines
  • the current line number
  • the name of the stream

The readline method returns the next line from the stream. A line is returned when the line ending is found (lf or crlf), when the stream runs out of bytes or when the maximum line length is reached. When no more data exists in the stream, an empty string is returned.

You can see the source here:

Example

To read lines you create a line buffer then you call its readline method, see the example below. There might not be enough memory for the buffer, so you need to check that it got created.

let lbO = newLineBuffer(templateStream,  filename=templateFilename)
if not lbO.isSome():
  # Not enough memory for the line buffer.
  return
var lb = lbO.get()
while line = lb.readLine():
  processLine(line)

Testing

To make testing easier the unit tests lower the line length and buffer length when it creates a LineBuffer.

You can see the tests here:

The line buffer module doesn’t have any dependencies so you could copy it into your nim project if you have similar requirements.

Unicode Evolution

The last post about the UTF-8 decoder got me thinking about how unicode has changed and improved over time.

Originally it was a two byte solution which is a logical step up from one byte ascii. The unicode team was saying two bytes was enough for all. This was a great solution since all the existing different encodings, single byte and multi-byte, could be replaced by unicode.

The early adopters, like Microsoft, embraced this and spent a lot of effort to adopt and promote it. They split their text interfaces in two, one for ascii and one for unicode.

One of the early issues to overcome was storing a unicode string natively on different processors which store words (two bytes units) differently. The Mac native order was opposite of the pc. The byte order character BOM was introduced so you could detect and read in the stored string correctly. (Note: Since UTF-8 is a byte stream it doesn’t need BOM.)

Once the unicode team decided that two bytes were not enough for their needs, the surrogate pair workaround was introduced to support more characters and be backward compatible. (Note: UTF-8 doesn’t use surrogate pairs.)

It wasn’t until later that UTF-8 encoding was invented. What a good idea that was. The web world has adopted it as well as most new applications.

UTF-8 has the advantage that you can treat it as a byte string in many cases. Most existing code doesn’t need to change. Only when you need a text unicode feature do you need to decode the characters.

Python

In Python 2 the basic string type was used for text strings and byte strings.

When they updated to 3.0, they decided to separate text strings from byte strings and treat all text strings as unicode.

“Python 3.0 mostly uses UTF-8 everywhere, but it was not a deliberate choice and it caused many issues when the locale encoding was not UTF-8.”

They were of the opinion that being able to loop over the code points and indexing similar to ascii is important. But they didn’t commit. There was a build flag for how strings are stored wide 32 bit or narrow 16 bit.

32 bit characters work since unicode was guaranteeing not to exceed 10FFFF code points. But 32 bits waste a lot of space for most text strings so they developed ways to compress and store smaller strings.

Both James and Victor explain well how python unicode support works and has evolved.

Rust

Rust is a newer language and its text strings are UTF-8. It insures that text strings are always correctly encoded.

Nim

Nim strings are also UTF-8 encoded however they allow invalid encoded strings. I think this is the sweet spot. You have performance since you don’t have to validate as much and more importantly you can use strings as byte strings as well as text strings. The bulk of the code is the same for each. When you need unicode features you can validate and pull in the special unicode handling code.

I believe UTF-8 is the One True Encoding.

UTF-8 Decoder

I wrote the utf-8 decoder for statictea because I found a bug in nim’s unicode validator. I am amazed how many common programs have bugs in their decoders.

I documented what I found in the utf8tests project. You can see the test results and the decoder in it:

For statictea, instead of referencing the decoder from the utf8tests project, I just copied the utf8decoder module into statictea. When the external module is updated, the statictea build process will tell that the module changed and should be updated.

What’s cool about the decoder:

  • it’s fast and small
  • it passes all the tests
  • it returns byte sequences for valid and invalid cases
  • it can start in the middle, no need to start at the beginning of a string
  • it is easy to build high level functions with it

The decoder is only a few lines of code and it is table driven.

The decoder self corrects and synchronizes if you start in the middle of a character byte sequence. The first return will be invalid, but the following return sequences will be valid.

The utf8decoder module contains useful functions yieldUtf8Chars, validateUtf8String, sanitizeUtf8 and utf8CharString all built around the decoder.

It’s important that your decoder passes the tests. For example, Microsoft introduced a security issue because their decoder allowed over long characters, e.g. it allowed multiple encodings for the same character. This was exploited by a hacker by using a multi-byte separator slash in a file path to access private files.

Processing DNG

How is a JPEG generated from a DNG image file?

DNG is the standard raw format. I have been using it for many years.

Even though it adds a conversion step, I think it is worthwhile.

I load my camera raw photos onto my machine, convert them to DNG using Adobe’s raw converter, then I delete the original raws.

Like the old 35mm negative film I save DNG files forever so I can make new images and get the best quality output. In the old days you would go to your darkroom or drugstore to have the negative made into a print. This process of enlarging, cropping and color adjustment is called processing or developing the image.

In the digital world JPEG is the like the old print. It is the final result for photos. There are other formats but none come close to JPEGs.

The DNG to JPEG processing step is mostly ignored by the articles on raw and DNG files.

I haven’t thought much about the full ramifications of the processing step until recently even though I have been using DNG for over a decade.

My raw files contain a small preview, I set it to 1024 x 768 to save storage space.

I do not store JPEGs for my raw files. I don’t want to maintain two files for the same image. When I edit the image, it would be a pain to keep the JPEG synced. I would have to come up with a scheme so two duplicate images do not show when viewing among other things.

Instead the raw files are processed on the fly when looking at them. I use an old version of Adobe Bridge as my viewer. The processing takes about two seconds on my machine the first time. After the first time it is fast because the processed data is cached.

I make JPEGs whenever I want to share my images. When posting to my or other websites or when transferring to friends.

In the old days you would write down cropping, sizing and exposure information for the developer to follow.

The equivalent with the digital negative is the embedded raw metadata. It holds the developer information.

The raw pixel data does not change when you edit in Camera Raw or Adobe Lightroom, just the embedded metadata changes. It contains data for the exposure, cropping etc. as text and numbers, for example:

Exposure .90
Cropping 1,2,3,4

There is no standard way to process the DNG to make a JPEG. The software to read the raw pixel data and apply the metadata to process it is proprietary. Adobe does this in their apps like Bridge and Lightroom seamlessly. It is easy to miss this important processing step.

You do notice it if you don’t have any Adobe applications. Try viewing without. For long term storage this is an issue.

There is a great open source command line program called dcraw that can process raw. It is used internally by a lot of apps that say they handle DNG.

Irfan viewer a popular free Windows image viewer uses dcraw. So does ImageMagik, a popular open source image library. Raw editing apps on Linux like Ufraw use it too.

The required processing metadata is documented by the DNG specification but how to interpret it is not specified.

The JPEG you get from dcraw does not look like the JPEG generated by Bridge. The exposure, cropping and rotation specified in the metadata is not honored. There are a lot of parameters to control dcraw, but none that control these options.

ImageMagik supports rotation and cropping, so if you were writing the processing code yourself you could figure out how to do this. At least these two seem straight forward. But what about exposure and shadow adjustment, color and the others? You need to be an imaging processing expert.

Adobe provides an SDK for dealing with DNG . I used it about seven years ago to produce JPEGs. It did not support cropping and rotation then. I’m not sure what it supports now. It was similar to dcraw and it took about the same two seconds to process.

I wonder how the cameras generate their JPEGs? My new camera can take ten frames a second. These frames are made from the raw pixel data. My guess is that the JPEG processing takes place later with a parallel process. I’ve never noticed any delay to the monitor which needs similar processing to create the image shown.

I have been thinking about this because I am writing a photo website to handle my raw workflow.

I was planning to upload my DNGs and generate JPEGs from them as needed on the server running on Linux. There doesn’t seem to be any way to do this without rolling my own code and reverse engineering what Adobe has done. This is more than I want to do.

Google+ and Google Picasa say they support DNG. I not sure whether they do more than dcraw on the processing side.

Flickr doesn’t support DNG. Interesting ideas thread:

flickrideas

You can make edits in Adobe Lightroom and see them in Adobe Photoshop. It gives the illusion that there exists a well known way to process the raw data. Can you make edits in Lightroom and see them in Aperture? Adobe can share the processing software between Adobe apps but do they share this code with other companies?

I found interesting information on Apple Aperture.

https://thephotosexpert.com/forum/raw-vs-dng/8766#.VIJHAAAgLA

From Walter Rowe
http://www.walterrowe.com/

Hi Joseph,

Let me see if I can be more clear.

Yes, Aperture can import a raw file in DNG format. And it knows the camera make and model and provides unique raw file processing based on that information. Aperture also consumes all of the meta data embedded in the DNG file like IPTC, keywords, contact info, ratings, labels, etc. All is well and good on the import side. The export side is where Aperture is incomplete regarding DNG.

Aperture can export a DNG master (eg. “Export master..”), provided it was imported as a DNG. It appears to only spit out the original DNG file you imported and any meta data you have written back to the master DNG file inside Aperture.

There seem to be two things missing from Aperture’s “Export master..” process for DNG files. The first is including an embedded, fully-rendered preview. This is an sRGB JPG that is fully baked with all your Aperture adjustments. Second is all of the adjustment settings themselves. I think Apple could add both of these easily enough if they chose.

The Adobe products include these pieces of data. I can “edit” a DNG in Lightroom, save everything back to the DNG file, open it in Photoshop, and see all of my adjustments from Lightroom. Likewise, I can make adjustments to a DNG file in Adobe Camera Raw, save them, and see these adjustments in Lightroom. And both products will embed the updated, baked preview inside the DNG file.

The embedded preview can be consumed by image management tools like MediaOnePro (formerly iView Media Pro). This frees image mgmt tools from needing to know how to interpret and render raw sensor data from different camera makers, and lets these tools include color-accurate thumbnails and previews in the image management database.

The DNG file format is nice. It retains all of the manufacturer’s original raw file data, can include the original raw file itself, can incorporate a “baked” preview with all your adjustments, can include all of the raw adjustments, and can include all of your meta data. It is a nicely packaged file format with everything you need for long term image management.

It would be nice to see Apple fully support all the features of the DNG file format in the “Export master..” process.

More information about DNG can be found on Adobe’s DNG page.

Does that help?
Walter

From a practical point of view I can get JPEGS by creating them using Adobe apps and uploading them to my website. But this adds manual steps I would rather not have.

What I am beginning to think that the best solution for my website is to change the raw preview setting so the full resolution JPEG is embedded in my DNG files.

When you edit a raw file the preview needs to be updated to match. This is the case with Adobe applications.

I need to figure out an easy way to update all my DNG files to have full resolution previews. Once that is done all my website needs to do is extract the embedded previews.

Words for Rain

They say that Eskimos have a 100 words for snow. As a native of Seattle, I wonder how many words for rain I know. This is the list I came up with.

Rain
Showers
Mist
Drizzle
Downpour
Cloudburst
Sleet
Torrential downpour
Monsoon
Pouring
Cats and dogs
Pineapple Express
Freezing rain
Sprinkle
Liquid sunshine
Deluge
Drencher
Flood
Flurry
Precipitation
Raindrops
Rainfall
Rainstorm
Sheets
Wet weather
Squall

Firefox backspace

On Ubuntu Firefox the backspace key it does nothing by default.  You can change a setting to make it go to the previous page. Filefox has a bunch of settings you can access by typing about:config in the address bar. By changing browser.backspace_action from 2 to 0, it does the trick.

When you type about:config in the address bar you get an interesting warning dialog.

After you promise to be careful, type browser.backspace_action into the search box. Then change the 2 to a 0.

Load Emacs Documentation on your Kindle

There is a lot of free content you can load onto your Kindle. It just a matter of converting to the right format for your Kindle. It is a great device for reading manuals. I’ve loaded my Kindle with all the Python documentation as well as the emacs documentation. Here I show how to load the emacs manual. You can use the same methods for others.

These steps can be done on the Windows and Macintosh platforms as well as Ubuntu. Here are the steps in detail for Ubuntu.

Configuration:

  • Kindle 2, Model D00511
  • Ubunutu 10.04 LTS

Download Emacs Single Page Document

Emacs provides a single page HTML file of all the documentation. Download this page to your machine.

cd ~/tmp

wget http://www.gnu.org/ software/ emacs/ manual/ html_mono/ emacs.html

You can see the file in the tmp folder.

ll -h
-rw-r–r– 1 steve steve 3.3M 2011-09-25 11:55 emacs.html

Note: Python documentation comes in a zip file of many HTML files. From all these files you can create an ebook for your Kindle. Just point the converter at the content.html file.

Install calibre

You use calibre to convert the html file to a book (mobi file). If you don’t have calibre, you can install it with:

sudo apt-get install calibre

Here is the current version:

ebook-convert –version
ebook-convert (calibre 0.8.20)
Created by: Kovid Goyal <kovid@kovidgoyal.net>

Note: There are many options you can specify when converting from html to mobi. The options are listed here:
http://manual.calibre-ebook.com/cli/ebook-convert-3.html#html-input-to-mobi-output

Convert HTML to Mobi

If you use the defaults for conversion, the index at the beginning of the book is one big table that runs for many pages. You cannot select any of the links so it is pretty useless. To fix this, specify linearize-tables option and the table will be converted to regular text.

Convert the emacs.html file into a mobi file and turn the tables into regular text. It takes a few minutes to convert:

ebook-convert emacs.html emacs.mobi –linearize-tables –title “The Emacs Editor” –authors gnu.org –comments “The Emacs Editor Kindle book generated from gnu.org single page HTML file.”
 

You can see the file created:

ll -h
total 4.8M
-rw-r–r– 1 steve steve 3.3M 2011-09-25 11:55 emacs.html
-rw-r–r– 1 steve steve 1.5M 2011-09-25 13:20 emacs.mobi

Copy to Kindle

Copy the book to your Kindle. Plug it into the USB port then:

cp emacs.mobi /media/Kindle/documents/
eject Kindle

The content page generated by calibre is at the end of the document. The book is called “The Emacs Editor”

Now you can store the manual under your pillow.