Skip to main content

Parsing Tiff EXIF entries

·2 mins

After writing my previous post about Extracting the original datetime from Canon and Olympus photos, I spent a couple of days improving my little tiff-parser to enable it to read arbitrary TIFF entries, rather than only the original datetime. The library is now able to extract entries from several of the file’s image file directories (IFDs): IFD#0, EXIF, or GPSInfo and the resulting API looks like this:

package main

import (
    "fmt"
    "os"

    "github.com/fedragon/tiff-parser/tiff"
    "github.com/fedragon/tiff-parser/tiff/entry"
)

func main() {
	r, err := os.Open("...")
	if err != nil {
		panic(err)
	}
	defer r.Close()

	p, err := tiff.NewParser(r)
	if err != nil {
		panic(err)
	}

	// an `entry.ID` is simply a type alias for `uint16`
	model := entry.ID(0x0110)

	// only needed when your entry is not listed in `tiff.Defaults`
	p.WithMapping(map[entry.ID]tiff.Group{
		model: tiff.GroupIfd0, // Model belongs to the first IFD (aka IFD#0)
		// ...
	})

	// provide the IDs of the entries you would like to collect
	entries, err := p.Parse(entry.ImageWidth, model)
	if err != nil {
		panic(err)
	}

	if en, ok := entries[entry.ImageWidth]; ok {
		// read the value, casting it to the expected data type
		width, err := p.ReadUint16(en)
		if err != nil {
			panic(err)
		}
		fmt.Printf("width: %v\n", width)
	}

	if en, ok := entries[model]; ok {
		model, err := p.ReadString(en)
		if err != nil {
			panic(err)
		}
		fmt.Printf("model: %v\n", model)
	}
}

In a nutshell:

  • the client instructs the parsers about the entries’ position in the TIFF structure (when not already covered by the defaults)
  • the parser parses the files, returning all (found) entries requested by the client in a hashmap
  • the client reads the entries’ values using one of the provided .Read* methods

It would have been more user-friendly to automagically expose the entry value according to its expected type, but, as manual tests confirmed, there are several manufacturer-specific exceptions to how IFD entries are written, even for basic entries such as imageWidth (uint16 in CR2, uint32 in ORF). I considered to at least promote types where safe (e.g. reading a uint8 with the .ReadUint16() method), but eventually gave up on the idea since it wouldn’t provide significant advantages: the client would still need to know the expected type of the entry.