Wiki Organization

I’ve been writing a lot of technical documentation the last few years. I want to share what I’ve learned about organizing a dev department wiki and the way I’ve found that works well.

Annotated Link

The key to organization and making it easier to find information is a good annotated link.

An annotated link is a link to a wiki page and a short sentence that tells what the page is about. For example:

  • Wiki Style Guide — how to write a wiki page and keep the wiki organized.

The annotated link starts with an asterisk (so it is a bullet point) followed by two dashes then the short sentence that tells what it is about.

You use this annotated link wherever you reference the page.

One benefit I’ve found is the process of writing a good annotation you end up writing a better wiki page because you have a better idea what it is about. You should be able to use the annotation sentence as the topic sentence of the page.

In the annotated link you use related terms and keywords so it is easier to find it by keyword searching.

Index

The next element for finding information is an index. The index contains all the annotated links in one big list. To find information in the index you use the browser text search.

As you write a new page, you add its annotated link to the bottom of the index. This gives you a relative time line history of when the pages were written. You look at the bottom to find new information.

Don’t alphabetize the list. Unlike a book index you can instantly find using text search.

One big list wouldn’t scale to wikipedia but it works well for a dev wiki. I’m using one with about a thousand links.

Table of Contents

The final element is the table of contents. Again you use the annotated links to build the table.

The hierarchy of the table is two levels first main pages then topic pages. The table of contents is an alphabetized list of main pages.

Here is an outline of the table of contents:

Table of Contents
  * topic a — main page for topic a
  * topic b — main page for topic b
  * topic c — main page for topic c
  …

Main Page

A “main” page contains a list of annotated links of related pages of a particular topic. You alphabetize these links. Each link points to a topic page.

A main page ends with a See Also section. It points back to the table of contents.

Below is the outline of the Wiki Information main page. It has annotated links to a few topic pages including the Wiki Style Guide and ending with a see also section.

Wiki Information
  * Topic Page A — how to do topic a.
  * Topic Page B — how to do topic b.
  * Topic Page C — how to do topic c.
  * Wiki Style Guide — how to write a wiki page and keep the wiki organized.
  * Topic Page Z — how to do topic z.
  = See Also =
    * Table of Contents — the table of contents for this wiki. 

Topic Page

A topic page contains the meat of the wiki. It tells you how to do something or explains a topic.

A topic page ends with a “See Also” section. The first link of the section contains a link back to its main page. If the page is in multiple main pages, it has multiple links back. This is important for navigation. Once you find one page of a topic, you can find all related pages through these links.

Say you have a Wiki Style Guide topic page. The topic page outline would look like the following:

Wiki Style Guide
  = Summary =
  = Section 1 =
  = Section 2 =
  = Section 3 =
  = See Also =
    * Wiki Information — main page for topics about the using the wiki and writing good wiki pages. 

Line Reader

For statictea I wrote a module to read lines from a file because nim’s high level functions do not support statictea’s requirements.

Nim’s builtin line reader (lines) does not return line endings and it doesn’t care about line lengths.

Statictea requires that:

  • it preserves template line endings
  • it uses a small fixed amount of memory
  • its commands have a line limit
  • it reports the template name and current line number for warnings
  • it processes a template sequentially in one pass.

The reader is implemented in the linebuffer module. It consists of the LineBuffer object with the readline method.

The LineBuffer object holds the following information.

  • the stream
  • the current position in the stream
  • a fixed size buffer for lines
  • the current line number
  • the name of the stream

The readline method returns the next line from the stream. A line is returned when the line ending is found (lf or crlf), when the stream runs out of bytes or when the maximum line length is reached. When no more data exists in the stream, an empty string is returned.

You can see the source here:

Example

To read lines you create a line buffer then you call its readline method, see the example below. There might not be enough memory for the buffer, so you need to check that it got created.

let lbO = newLineBuffer(templateStream,  filename=templateFilename)
if not lbO.isSome():
  # Not enough memory for the line buffer.
  return
var lb = lbO.get()
while line = lb.readLine():
  processLine(line)

Testing

To make testing easier the unit tests lower the line length and buffer length when it creates a LineBuffer.

You can see the tests here:

The line buffer module doesn’t have any dependencies so you could copy it into your nim project if you have similar requirements.

Unicode Evolution

The last post about the UTF-8 decoder got me thinking about how unicode has changed and improved over time.

Originally it was a two byte solution which is a logical step up from one byte ascii. The unicode team was saying two bytes was enough for all. This was a great solution since all the existing different encodings, single byte and multi-byte, could be replaced by unicode.

The early adopters, like Microsoft, embraced this and spent a lot of effort to adopt and promote it. They split their text interfaces in two, one for ascii and one for unicode.

One of the early issues to overcome was storing a unicode string natively on different processors which store words (two bytes units) differently. The Mac native order was opposite of the pc. The byte order character BOM was introduced so you could detect and read in the stored string correctly. (Note: Since UTF-8 is a byte stream it doesn’t need BOM.)

Once the unicode team decided that two bytes were not enough for their needs, the surrogate pair workaround was introduced to support more characters and be backward compatible. (Note: UTF-8 doesn’t use surrogate pairs.)

It wasn’t until later that UTF-8 encoding was invented. What a good idea that was. The web world has adopted it as well as most new applications.

UTF-8 has the advantage that you can treat it as a byte string in many cases. Most existing code doesn’t need to change. Only when you need a text unicode feature do you need to decode the characters.

Python

In Python 2 the basic string type was used for text strings and byte strings.

When they updated to 3.0, they decided to separate text strings from byte strings and treat all text strings as unicode.

“Python 3.0 mostly uses UTF-8 everywhere, but it was not a deliberate choice and it caused many issues when the locale encoding was not UTF-8.”

They were of the opinion that being able to loop over the code points and indexing similar to ascii is important. But they didn’t commit. There was a build flag for how strings are stored wide 32 bit or narrow 16 bit.

32 bit characters work since unicode was guaranteeing not to exceed 10FFFF code points. But 32 bits waste a lot of space for most text strings so they developed ways to compress and store smaller strings.

Both James and Victor explain well how python unicode support works and has evolved.

Rust

Rust is a newer language and its text strings are UTF-8. It insures that text strings are always correctly encoded.

Nim

Nim strings are also UTF-8 encoded however they allow invalid encoded strings. I think this is the sweet spot. You have performance since you don’t have to validate as much and more importantly you can use strings as byte strings as well as text strings. The bulk of the code is the same for each. When you need unicode features you can validate and pull in the special unicode handling code.

I believe UTF-8 is the One True Encoding.

UTF-8 Decoder

I wrote the utf-8 decoder for statictea because I found a bug in nim’s unicode validator. I am amazed how many common programs have bugs in their decoders.

I documented what I found in the utf8tests project. You can see the test results and the decoder in it:

For statictea, instead of referencing the decoder from the utf8tests project, I just copied the utf8decoder module into statictea. When the external module is updated, the statictea build process will tell that the module changed and should be updated.

What’s cool about the decoder:

  • it’s fast and small
  • it passes all the tests
  • it returns byte sequences for valid and invalid cases
  • it can start in the middle, no need to start at the beginning of a string
  • it is easy to build high level functions with it

The decoder is only a few lines of code and it is table driven.

The decoder self corrects and synchronizes if you start in the middle of a character byte sequence. The first return will be invalid, but the following return sequences will be valid.

The utf8decoder module contains useful functions yieldUtf8Chars, validateUtf8String, sanitizeUtf8 and utf8CharString all built around the decoder.

It’s important that your decoder passes the tests. For example, Microsoft introduced a security issue because their decoder allowed over long characters, e.g. it allowed multiple encodings for the same character. This was exploited by a hacker by using a multi-byte separator slash in a file path to access private files.

Statictea Dependencies Graphs

Statictea has two types of dependency graphs, the module dependencies and the nim system module dependencies. You can learn a lot about a project from its dependencies.

Below is the link to the statictea’s source code documentation. The module dependency graph is at the top. If you scroll down to the bottom you see the nim system module dependency graph. If you are viewing on a phone, you can pinch zoom to see more detail.

statictea’s source code docs

Module Dependencies

A graph node represents a nim source code module (file). The lines connecting the nodes show the dependencies. For example the parseCommandLine module on the left depends on the cmdline module.

From the module dependency graph you can see:

  • which modules you can share with other projects
  • the frequently used modules
  • the relative module file sizes
  • the project entry point
  • the stand alone modules

It’s easier to edit and test modular code because of well defined and simple interfaces.

The green nodes do not depend on any module. These are the modules you can easily share with other projects.

The red dependency line tells you that the module has only one dependency. I try to rework the code to remove the dependency if feasible. The opresult and matches modules are example of this.

If you put all your code in one module, it wouldn’t have any dependencies! However it would be to bloated and not granular enough for sharing with other projects. You can tell the relative size of a module by the size of its node. runCommand is the biggest module.

The comparelines, jsondocraw and stfrunner modules are stand alone commands so they are not connected in this graph.

Nim Module Graph

The nim module graph at the bottom of the page tells you the nim language APIs you need. It shows which nim modules are used by which statictea modules.

The green nim modules are only used by one statictea module. I strive for green modules, that way all related functions are grouped together.

Often I wrap a nim module. The statictea wrapper module exposes a limited and simple interface tailored for statictea needs. This was done for the re module. The graph tells you that statictea only uses the regular expression functions in the regexes module and not the multitude of functions in the re module. The readjson module is another example of this.

Building Graphs

You build the dependency graph with the nimble task dotsrc and the nim module graph with dotsys. Search for createDependencyGraph and createDependencyGraph2 in the nimble file.

Each graph is an svg file. The graphviz program produces the file based on a text file defining the dependencies. The text file is created by the nimble task starting from the data generated by nim’s genDepend command.

* https://graphviz.org/

StfRunner

The stfRunner command is used for running statictea system tests.

StfRunner executes a stf test file which tests a feature. A stf file contains instructions for creating files, running files and comparing files.

A normal test creates all the input files, runs statictea then compares the outputs to the expected values.

Although stfRunner was written to test statictea, you can use it to test any command line program.

What’s cool about it:

* you specify each test in one file
* a test with its documentation looks good in a markdown reader
* the test file format is simple
* you can test standard out, standard error, the error code as well as files

Below is the hello world statictea readme example. The test creates four files “cmd.sh”, “hello.json”, “hello.html” and “stdout.expected”. It then runs the cmd.sh file which creates two output files: “stdout” and “stderr”. The final steps compare the output files with the expected files.

stf file, version 0.1.0

# Hello World

Readme Hello World example.

### File cmd.sh command

~~~
$statictea \
-s hello.json \
-t hello.html \
>stdout 2>stderr
~~~

### File hello.html

~~~
hello {s.name}
~~~

### File hello.json

~~~
{"name": "world"}
~~~

### File stdout.expected

~~~
hello world
~~~

### Expected stdout == stdout.expected
### Expected stderr == empty


In this example, six files where created. Using some other test method, it would be hard to manage all these files when you have a lot of tests.

Since the test file is a markdown file, you can create easy to read tests and associated embedded documentation and it looks good in a markdown reader. You can see what hello world it looks like viewed with github’s viewer:

* hello.stf.md

Here are all the statictea system tests:

* statictea stf tests

I’m planning to spin out stfRunner into its own standalone github project.

The letters STF stand for “Single Test File” or my name STeve Flenniken.

I’m considering changing its name to tea-runner. You can think of it as t-runner for test-runner, stf-runner, or maybe rum-runner or beer-run.

Uniform Function Call Syntax

The Uniform Function Call Syntax generalizes calling a function and calling a member function.

This relatively small feature has a big impact how you think about objects how you reference code.

You can call a procedure two equivalent ways, by passing all the arguments to it or by treating the first argument like you would for an object when calling its methods (dot notation). For example calling procName which take three parameters can be done like:

procName(a, b, c)

or

a.procName(b, c)


You can omit the parentheses when they are empty. For example when no parameters:

procName2()

or

procName2

Or when there is one parameter:

procName3(a)

or

a.procName3()

or

a.procName3


The first parameter can be any type, it is not restricted to objects and you can use variables or literals.

For an add procedure with two integer parameters you can call it shown below. The last line shows chaining by adding 1 + 2 + 3.

v1 = add(1, 2)
v2 = 1.add(2)
v3 = 1.add(2).add(3)


You can extend an object defined in another module by defining a procedure with the object as the first parameter.

newMethodName(obj, a, b, c)

And you call it like the other methods of the object:

obj.newMethodName(a, b, c)

When you omit the parentheses it appears like you are accessing a member variable. These are called object getters in some languages. For example:

“str”.len

The Nim tutorial talks about how to do “setters”.

* https://nim-lang.org/docs/tut2.html

See the wikipeadia page for more information.

* https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax

Posted in Nim

Nim Macros

The Nim programming language provides powerful meta-programming capabilities with nim macros. Macros run at compile time and they operate on nim’s abstract syntax tree (AST) directly. You write macros using the regular nim language.

This post tells how to write nim macros with lots of simple examples.

You can write a macro using a string of code or by creating AST structures. I call these two styles of macros:

  • text style macro
  • AST style macro

You can also categorize nim macros by how you invoke them.

1. Expression macro

You invoke an expression macro like a procedure call and it will generate new code at that point in the program. The macro generates AST from scratch and it is inserted at the point it is called.

2. Pragma macro

You invoke a pragma macro when a procedure is compiled by tagging the procedure with a pragma and naming the macro as the pragma. The procedure AST is passed to the macro for modification.

3. Block macro

You invoke a block macro when a block of code is compiled by naming the macro as the block. The block AST is passed to the block macro for modification.

You define each type of macro the same except the pragma and block macros have a hidden last parameter for the AST. In all cases macros return an AST.

Simple Example

Here is a simple nim program stored in the file t.nim. We will use it to investigate nim macros.

proc hello() =
  echo "hi"

hello()

The program defines the hello procedure then calls it.

Here is the output when compiling and running the program. All the Hint lines have been removed from the output for simplicity.

nim c -r t
hi

Text Style Expression Macro

Let’s write a text style expression macro to generate the hello proc above.

import macros

macro gen_hello(): typed =
  let source = """
proc hello() =
  echo "hi"
"""
  result = parseStmt(source)

gen_hello()
hello()

Here is the output when compiling and running:


nim c -r t
hi

The macro is defined like a procedure except you use “macro” instead of “proc”. The result is the AST you want to insert at the point the macro is called. The parseStmt converts the string to AST. The “hello()” call calls the hello procedure generated by the macro.

AST Style Expression Macro

Now lets write the same expression macro in AST style.

Before we do that we need to know what the AST looks like. You could consult the macro module docs. But it is easier run a couple of macros in the macros module to dump out the code so you can see the AST.

For example here is code to dump out our simple hello program using the dumpTree macro.

import macros
dumpTree:
  proc hello() =
    echo "hi"

When running it you get a list of AST nodes indented to show the hierarchy. At the root is a StmtList (statement list) and it contains one node called ProcDef for the definition of the procedure named hello.

StmtList
  ProcDef
    Ident !"hello"
    Empty
    Empty
    FormalParams
      Empty
    Empty
    Empty
    StmtList
      Command
        Ident !"echo"
        StrLit hi

You can also use the dumpAstGen macro. It will generate the textual code needed to build the AST.

import macros
dumpAstGen:
  proc hello() =
    echo "hi"

When running it you get:

nnkStmtList.newTree(
  nnkProcDef.newTree(
    newIdentNode(!"hello"),
    newEmptyNode(),
    newEmptyNode(),
    nnkFormalParams.newTree(
      newEmptyNode()
    ),
    newEmptyNode(),
    newEmptyNode(),
    nnkStmtList.newTree(
      nnkCommand.newTree(
        newIdentNode(!"echo"),
        newLit("hi")
      )
    )
  )
)

Now that we know the required AST we can write our AST style expression macro. We take the dumpAstGen output and assign it to result.

import macros
macro gen_hello(): typed =
  result = nnkStmtList.newTree(
    nnkProcDef.newTree(
      newIdentNode(!"hello"),
      newEmptyNode(),
      newEmptyNode(),
      nnkFormalParams.newTree(
        newEmptyNode()
      ),
      newEmptyNode(),
      newEmptyNode(),
      nnkStmtList.newTree(
        nnkCommand.newTree(
          newIdentNode(!"echo"),
          newLit("hi")
        )
      )
    )
  )
gen_hello()
hello()

Here is the output when compiling and running:

nim c -r t
hi

Pragma Macro

A pragma macro has the same name as a nim pragma. A pragma is specified with curly bracks like: {.pragma echoName.}. You add pragmas to procedures.

Let’s write a pragma macro to display the procedure’s name when the procedure is called. For our hello procedure the macro would transform it to:

proc hello():
  echo "hello"
  echo "hi"

Looking back at the output from dumpAstGen we see the AST structure we need to generate a command that echos “hello”.

    nnkStmtList.newTree(
      nnkCommand.newTree(
        newIdentNode(!"echo"),
        newLit("hello")
      ),

But we do not want to show “hello” for all procedures but instead show the procedure’s name. The name comes from the top of the tree in the IdentNode.

nnkStmtList.newTree(
  nnkProcDef.newTree(
    newIdentNode(!"hello"),

Here is our starting attempt at writing the pragma macro. The pragma and macro are called echoName. The macro is passed the AST of the procedure, in this case the procedure is main. The main procedure is annotated with the pragma. Notice it goes at the end of the procedure definition. The line “let msg = name(x)” gets the procedure name and the next line displays it.

import macros

macro echoName(x: untyped): untyped =
  let msg = name(x)
  echo msg

proc main (p: int): string {.echoName.} =
  result = "test"

Here is the output when compiling and running. During the macro processing step it outputs “main”. You can debug your macro with echo statements.

nim c -r t
Hint: used config file '/usr/local/Cellar/nim/0.17.2/nim/config/nim.cfg' [Conf]
Hint: system [Processing]
Hint: t [Processing]
Hint: macros [Processing]
main
Hint:  [Link]

The “name” procedure is defined in the macro module. It returns the name of the procedure given a procedure AST node.

The meta-type “untyped” matches anything. It is lazy evaluated so you can pass undefined symbols to it.

There are two other meta-types, typed and typedesc. They are not lazy evaluated.

Here is a working pragma macro that echoes the procedure name when it is called. The “let name = $name(x)” line gets the name of the procedure as a string. The next line creates a new node that echoes the name. The insert adds the node to the body of the procedure as the first statement. You can use treeRepr for debugging.

Our pragma macro is invoked at compile time for each proc tagged with the {.echoName.} pragma.

import macros

macro echoName(x: untyped): untyped =
  let name = $name(x)
  let node = nnkCommand.newTree(newIdentNode(!"echo"), newLit(name))
  insert(body(x), 0, node)
  # echo "treeRepr = ", treeRepr(x)
  result = x

proc add(p: int): int {.echoName.} =
  result = p + 1

proc process(p: int) {.echoName.} =
  echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the output when compiling and running:

nim c -r t

process
add
ans for 5 is 6
process
add
ans for 8 is 9

Now we enhance the macro to show how you pass parameters to pragmas. In this example we pass a custom message string. When invoking the pragma you add the parameter after a colon as shown below.

By default all arguments are AST expressions. The msg string is passed to the macro as a StrLit node, which happens to be what the newIdentNode procedure requires.

The ! operator in the macro module creates an identifier node from a string.

import macros

macro echoName(msg: untyped, x: untyped): untyped =
  let node = nnkCommand.newTree(newIdentNode(!"echo"), msg)
  insert(body(x), 0, node)
  result = x

proc add(p: int): int {.echoName: "calling add proc".} =
  result = p + 1

proc process(p: int) {.echoName: "calling process".} =
  echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the output when compiling and running:

calling process
calling add proc
ans for 5 is 6
calling process
calling add proc
ans for 8 is 9

Pass Normal Parameters

You can pass normal types to macros with the “static” syntax. Here is an example of passing an int. The macro echoes the name concatenated with the number.

import macros

macro echoName(value: static[int], x: untyped): untyped =
  let node = nnkCommand.newTree(newIdentNode(!"echo"), newLit($name(x) & $value))
  insert(body(x), 0, node)
  result = x

proc add(p: int): int {.echoName: 42} =
  result = p + 1

proc process(p: int) {.echoName: 43} =
  echo "ans for ", p, " is ", add(p)

process(5)
process(8)

output:

process43
add42
ans for 5 is 6
process43
add42
ans for 8 is 9

Multiple Macro Parameters

You can pass one parameter to a pragma macro. If you want to pass more values, you can use a tuple. Here is an example of passing a number and a string to the macro.

import macros

type
  Parameters = tuple[value: int, ending: string]

macro echoName(p: static[Parameters], x: untyped): untyped =
  # echo "x = ", treeRepr(x)
  let node = nnkCommand.newTree(newIdentNode(!"echo"),
               newLit($name(x) & $p.value & p.ending))
  insert(body(x), 0, node)
  result = x

proc add(p: int): int {.echoName: (42, "p1").} =
  result = p + 1

proc process(p: int) {.echoName: (43, "p2").} =
  echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the output when compiling and running:

process43p2
add42p1
ans for 5 is 6
process43p2
add42p1
ans for 8 is 9

Block Macro

If you name a block with the name of a macro, the macro is invoked when the block is compiled. The AST of the block is passed to the macro as the last parameter.

Here is an example block macro that prints out the AST past to it.

import macros

macro echoName(x: untyped): untyped =
  echo "x = ", treeRepr(x)
  result = x

echoName:
  proc add(p: int): int =
    result = p + 1

  proc process(p: int) =
    echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the results when compiling and running.

x = StmtList
  ProcDef
    Ident !"add"
    Empty
    Empty
    FormalParams
      Ident !"int"
      IdentDefs
        Ident !"p"
        Ident !"int"
        Empty
    Empty
    Empty
    StmtList
      Asgn
        Ident !"result"
        Infix
          Ident !"+"
          Ident !"p"
          IntLit 1
  ProcDef
    Ident !"process"
    Empty
    Empty
    FormalParams
      Empty
      IdentDefs
        Ident !"p"
        Ident !"int"
        Empty
    Empty
    Empty
    StmtList
      Command
        Ident !"echo"
        StrLit ans for
        Ident !"p"
        StrLit  is
        Call
          Ident !"add"
          Ident !"p"

To write the macro so it prints out the name of the procedures when called, we need to find the procedure nodes in the AST and add the echo as before.

You can use the children procedure to loop through the child nodes of the AST statements. You find the proc’s by checking for the node type nnkProcDef. You add nnk prefix to the names output by treeRepr. Notice we added echo statements to the block that we need to skip over.

import macros

macro echoName(x: untyped): untyped =
  for child in x.children():
    if child.kind == nnkProcDef:
      let node = nnkCommand.newTree(newIdentNode(!"echo"),
                   newLit($name(child)))
      insert(body(child), 0, node)
  result = x

echoName:
  echo "an echo statement"
  proc add(p: int): int =
    result = p + 1
  echo "another echo statement"
  proc process(p: int) =
    echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the output:

an echo statement
another echo statement
process
add
ans for 5 is 6
process
add
ans for 8 is 9

The following shows how to pass parameters to your block macros. In this example we pass the string “called” .

import macros

macro echoName(msg: static[string], x: untyped): untyped =
  for child in x.children():
    if child.kind == nnkProcDef:
      let node = nnkCommand.newTree(newIdentNode(!"echo"),
                   newLit($name(child) & "-" & $msg))
      insert(body(child), 0, node)
  result = x

echoName("called"):
  echo "an echo statement"
  proc add(p: int): int =
    result = p + 1
  echo "another echo statement"
  proc process(p: int) =
    echo "ans for ", p, " is ", add(p)

process(5)
process(8)

Here is the output:

an echo statement
another echo statement
process-called
add-called
ans for 5 is 6
process-called
add-called
ans for 8 is 9

More Information

See the macro module documentation for the complete AST syntax and other useful procedures and operators.

https://nim-lang.org/docs/macros.html

Posted in Nim

Processing DNG

How is a JPEG generated from a DNG image file?

DNG is the standard raw format. I have been using it for many years.

Even though it adds a conversion step, I think it is worthwhile.

I load my camera raw photos onto my machine, convert them to DNG using Adobe’s raw converter, then I delete the original raws.

Like the old 35mm negative film I save DNG files forever so I can make new images and get the best quality output. In the old days you would go to your darkroom or drugstore to have the negative made into a print. This process of enlarging, cropping and color adjustment is called processing or developing the image.

In the digital world JPEG is the like the old print. It is the final result for photos. There are other formats but none come close to JPEGs.

The DNG to JPEG processing step is mostly ignored by the articles on raw and DNG files.

I haven’t thought much about the full ramifications of the processing step until recently even though I have been using DNG for over a decade.

My raw files contain a small preview, I set it to 1024 x 768 to save storage space.

I do not store JPEGs for my raw files. I don’t want to maintain two files for the same image. When I edit the image, it would be a pain to keep the JPEG synced. I would have to come up with a scheme so two duplicate images do not show when viewing among other things.

Instead the raw files are processed on the fly when looking at them. I use an old version of Adobe Bridge as my viewer. The processing takes about two seconds on my machine the first time. After the first time it is fast because the processed data is cached.

I make JPEGs whenever I want to share my images. When posting to my or other websites or when transferring to friends.

In the old days you would write down cropping, sizing and exposure information for the developer to follow.

The equivalent with the digital negative is the embedded raw metadata. It holds the developer information.

The raw pixel data does not change when you edit in Camera Raw or Adobe Lightroom, just the embedded metadata changes. It contains data for the exposure, cropping etc. as text and numbers, for example:

Exposure .90
Cropping 1,2,3,4

There is no standard way to process the DNG to make a JPEG. The software to read the raw pixel data and apply the metadata to process it is proprietary. Adobe does this in their apps like Bridge and Lightroom seamlessly. It is easy to miss this important processing step.

You do notice it if you don’t have any Adobe applications. Try viewing without. For long term storage this is an issue.

There is a great open source command line program called dcraw that can process raw. It is used internally by a lot of apps that say they handle DNG.

Irfan viewer a popular free Windows image viewer uses dcraw. So does ImageMagik, a popular open source image library. Raw editing apps on Linux like Ufraw use it too.

The required processing metadata is documented by the DNG specification but how to interpret it is not specified.

The JPEG you get from dcraw does not look like the JPEG generated by Bridge. The exposure, cropping and rotation specified in the metadata is not honored. There are a lot of parameters to control dcraw, but none that control these options.

ImageMagik supports rotation and cropping, so if you were writing the processing code yourself you could figure out how to do this. At least these two seem straight forward. But what about exposure and shadow adjustment, color and the others? You need to be an imaging processing expert.

Adobe provides an SDK for dealing with DNG . I used it about seven years ago to produce JPEGs. It did not support cropping and rotation then. I’m not sure what it supports now. It was similar to dcraw and it took about the same two seconds to process.

I wonder how the cameras generate their JPEGs? My new camera can take ten frames a second. These frames are made from the raw pixel data. My guess is that the JPEG processing takes place later with a parallel process. I’ve never noticed any delay to the monitor which needs similar processing to create the image shown.

I have been thinking about this because I am writing a photo website to handle my raw workflow.

I was planning to upload my DNGs and generate JPEGs from them as needed on the server running on Linux. There doesn’t seem to be any way to do this without rolling my own code and reverse engineering what Adobe has done. This is more than I want to do.

Google+ and Google Picasa say they support DNG. I not sure whether they do more than dcraw on the processing side.

Flickr doesn’t support DNG. Interesting ideas thread:

flickrideas

You can make edits in Adobe Lightroom and see them in Adobe Photoshop. It gives the illusion that there exists a well known way to process the raw data. Can you make edits in Lightroom and see them in Aperture? Adobe can share the processing software between Adobe apps but do they share this code with other companies?

I found interesting information on Apple Aperture.

https://thephotosexpert.com/forum/raw-vs-dng/8766#.VIJHAAAgLA

From Walter Rowe
http://www.walterrowe.com/

Hi Joseph,

Let me see if I can be more clear.

Yes, Aperture can import a raw file in DNG format. And it knows the camera make and model and provides unique raw file processing based on that information. Aperture also consumes all of the meta data embedded in the DNG file like IPTC, keywords, contact info, ratings, labels, etc. All is well and good on the import side. The export side is where Aperture is incomplete regarding DNG.

Aperture can export a DNG master (eg. “Export master..”), provided it was imported as a DNG. It appears to only spit out the original DNG file you imported and any meta data you have written back to the master DNG file inside Aperture.

There seem to be two things missing from Aperture’s “Export master..” process for DNG files. The first is including an embedded, fully-rendered preview. This is an sRGB JPG that is fully baked with all your Aperture adjustments. Second is all of the adjustment settings themselves. I think Apple could add both of these easily enough if they chose.

The Adobe products include these pieces of data. I can “edit” a DNG in Lightroom, save everything back to the DNG file, open it in Photoshop, and see all of my adjustments from Lightroom. Likewise, I can make adjustments to a DNG file in Adobe Camera Raw, save them, and see these adjustments in Lightroom. And both products will embed the updated, baked preview inside the DNG file.

The embedded preview can be consumed by image management tools like MediaOnePro (formerly iView Media Pro). This frees image mgmt tools from needing to know how to interpret and render raw sensor data from different camera makers, and lets these tools include color-accurate thumbnails and previews in the image management database.

The DNG file format is nice. It retains all of the manufacturer’s original raw file data, can include the original raw file itself, can incorporate a “baked” preview with all your adjustments, can include all of the raw adjustments, and can include all of your meta data. It is a nicely packaged file format with everything you need for long term image management.

It would be nice to see Apple fully support all the features of the DNG file format in the “Export master..” process.

More information about DNG can be found on Adobe’s DNG page.

Does that help?
Walter

From a practical point of view I can get JPEGS by creating them using Adobe apps and uploading them to my website. But this adds manual steps I would rather not have.

What I am beginning to think that the best solution for my website is to change the raw preview setting so the full resolution JPEG is embedded in my DNG files.

When you edit a raw file the preview needs to be updated to match. This is the case with Adobe applications.

I need to figure out an easy way to update all my DNG files to have full resolution previews. Once that is done all my website needs to do is extract the embedded previews.

Words for Rain

They say that Eskimos have a 100 words for snow. As a native of Seattle, I wonder how many words for rain I know. This is the list I came up with.

Rain
Showers
Mist
Drizzle
Downpour
Cloudburst
Sleet
Torrential downpour
Monsoon
Pouring
Cats and dogs
Pineapple Express
Freezing rain
Sprinkle
Liquid sunshine
Deluge
Drencher
Flood
Flurry
Precipitation
Raindrops
Rainfall
Rainstorm
Sheets
Wet weather
Squall