Writing a DNS Sinkhole and running it on a Raspberry Pi
Table of Contents
Context #
Last week I found an ancient Raspberry Pi Model A (from now on shortened to RPi) in one of my drawers, so I ended up on their website and eventually starting to read this article about using a RPi to block ads on a home network using pi-hole. Turning that RPi into a DNS sinkhole sounded like a good way to put it to good use… but writing my own DNS sinkhole sounded even more interesting, so I started digging into the topic. This is a recount of my journey so far, which resulted in a learn-by-doing toy project.
Agenda #
I’m first going to write about what I learned along the way about DNS resolution and what a DNS sinkhole is. Then, I’m going to describe the building blocks of my project. Finally, I’m going to describe how I setup my RPi to run it. In the appendix, there will be a brief description of the DNS protocol format.
Domain Name System (DNS) #
DNS is a distributed, hierarchical registry containing the IP addresses of all public websites on the Internet.
Glossary #
DNS Resolver: a machine that receives DNS queries from client machines (such as browsers) and takes charge of the process of translating (if possible) domain names into their corresponding IP address(es).
Root Server: a machine that is the entrypoint to the translation process: it holds references to other, more specific nameservers called Top-Level Domain Servers.
Top-Level Domain (TLD) Server: a machine that is in charge of one or more top-level domains (think: .com
, .eu
, etc …): it holds references to more specific servers called Authoritative Servers.
Authoritative Server: a machine that is in charge of specific domains (e.g. github.com
, lobste.rs
, etc …): it keeps track of their actual IP addresses in data structures called Resource Records.
Resolution #
DNS resolution comes in two flavours: recursive and iterative. I’m going to use the example of a user who wants to open www.github.com to describe both scenarios with diagrams. In a nutshell, the difference is in how much work the DNS Resolver needs to do on its own.
Note: Even though caching is extensively used in DNS resolution, to reduce load on the underlying infrastructure, in the following diagrams I’m going to take it out from the picture, to show the basic scenario.
Recursive #
The DNS Resolver sends the initial DNS query, then the Nameservers take care of the rest.
- the DNS Resolver sends a DNS query to a Root Server
- the Root Server looks up the TLD Server for the
.com
domain (as per my example) in its own database - the Root Server forwards the DNS query to the TLD Server
- the TLD Server looks up the Authoritative Server in charge of the
github.com
domain in its own database - the TLD Server forwards the DNS query to the Authoritative Server
- the Authoritative Server looks up the related resource record in its own database
- the Authoritative Server sends a DNS response containing the IP address for the
www
record ofgithub.com
to the TLD Server - the TLD Server forwards the DNS response to the Root Server
- the Root Server forwards the DNS response to the DNS Resolver
+-------+ +---------+ +-------------+ +-------------+ +-----------+ +---------------------+
| User | | Client | | DNSResolver | | RootServer | | TLDServer | | AuthoritativeServer |
+-------+ +---------+ +-------------+ +-------------+ +-----------+ +---------------------+
| | | | | |
| open github.com | | | | |
|--------------------->| | | | |
| | | | | |
| | send DNS query | | | |
| |------------------------->| | | |
| | | | | |
| | | send DNS query | | |
| | |------------------------->| | |
| | | | | |
| | | | what's the IP address of | |
| | | | the TLD Server for .com? | |
| | | |------------------------- | |
| | | | | | |
| | | |<------------------------ | |
| | | | | |
| | | | send DNS query | |
| | | |------------------------------->| |
| | | | | |
| | | | | what's the IP address of |
| | | | | the Authoritative Server |
| | | | | for github.com? |
| | | | |------------------------- |
| | | | | | |
| | | | |<------------------------ |
| | | | | |
| | | | | send DNS query |
| | | | |------------------------------------>|
| | | | | |
| | | | | | what's the IP
| | | | | | address of github.com?
| | | | | |-----------------------
| | | | | | |
| | | | | |<----------------------
| | | | | |
| | | | | send DNS response |
| | | | | (140.82.121.4) |
| | | | |<------------------------------------|
| | | | | |
| | | | send DNS response | |
| | | |<-------------------------------| |
| | | | | |
| | | send DNS response | | |
| | |<-------------------------| | |
| | | | | |
| | send DNS response | | | |
| |<-------------------------| | | |
| | | | | |
| | open 140.82.121.4 | | | |
| |------------------ | | | |
| | | | | | |
| |<----------------- | | | |
| | | | | |
| here you go! | | | | |
|<---------------------| | | | |
| | | | | |
Iterative #
The DNS Resolver interacts with every Nameserver “discovered” along the way, until it comes to a resolution.
- the DNS Resolver sends a DNS query to the Root Server
- the Root Server looks up the IP address of the TLD server in charge of the
.com
domain, and sends it back to the Resolver - the DNS Resolver sends a DNS query to the TLD Server, which looks up for the IP address of the Authoritative Server in charge of the
github.com
domain, and sends it back to the Resolver - the DNS Resolver sends a DNS query to the Authoritative Server, which looks up the IP address of the
www
record forgithub.com
and sends it back to the Resolver
+-------+ +---------+ +-------------+ +-------------+ +-----------+ +---------------------+
| User | | Client | | DNSResolver | | RootServer | | TLDServer | | AuthoritativeServer |
+-------+ +---------+ +-------------+ +-------------+ +-----------+ +---------------------+
| | | | | |
| open github.com | | | | |
|--------------------->| | | | |
| | | | | |
| | send DNS query | | | |
| |------------------------->| | | |
| | | | | |
| | | send DNS query | | |
| | |------------------------->| | |
| | | | | |
| | | | what's the IP address | |
| | | | of the TLD Server | |
| | | | for .com? | |
| | | |---------------------- | |
| | | | | | |
| | | |<--------------------- | |
| | | | | |
| | | send DNS response | | |
| | |<-------------------------| | |
| | | | | |
| | | send DNS query | | |
| | |------------------------------------------------------->| |
| | | | | |
| | | | | what's the IP address |
| | | | | of the Authoritative |
| | | | | Server for github.com? |
| | | | |----------------------- |
| | | | | | |
| | | | |<---------------------- |
| | | | | |
| | | | send DNS response | |
| | |<-------------------------------------------------------| |
| | | | | |
| | | send DNS query | | |
| | |------------------------------------------------------------------------------------------->|
| | | | | |
| | | | | | what's the IP
| | | | | | address of
| | | | | | github.com?
| | | | | |--------------
| | | | | | |
| | | | | |<-------------
| | | | | |
| | | | | send DNS response |
| | | | | (140.82.121.4) |
| | | | |<----------------------------------|
| | | | | |
| | | | send DNS response | |
| | | |<----------------------------| |
| | | | | |
| | | send DNS response | | |
| | |<-------------------------| | |
| | | | | |
| | send DNS response | | | |
| |<-------------------------| | | |
| | | | | |
| | open 140.82.121.4 | | | |
| |------------------ | | | |
| | | | | | |
| |<----------------- | | | |
| | | | | |
| here you go! | | | | |
|<---------------------| | | | |
| | | | | |
Sinkhole #
A DNS sinkhole […] is a Domain Name System (DNS) server that has been configured to hand out non-routable addresses for a certain set of domain names.
Source: Wikipedia
Project Scope #
I’m going to write just enough code to fulfill the above definition of DNS sinkhole, so I have decided the minimum set of functionality to be:
- import blacklisted domains at application boot
- listen for DNS queries
- parse DNS queries
- respond to DNS queries for blacklisted domains with non-routable addresses
- respond to DNS queries for legitimate domains with valid addresses (if possible)
In order to respond to legitimate DNS queries, I’ll forward them to a real recursive DNS resolver (i.e. Cloudflare’s, 1.1.1.1
) and then return its response to the client.
According to RFC3330, addresses in the 0.0.0.0/8
range refer to source addresses for “this” network, so I arbitrarily decided to return 0.0.0.42
for blacklisted domains.
In order to have a curated, aggregated source of blacklisted domains, I’m going to import them from a Steven Black hosts file.
I’m going to run the application on the RPi as a systemd service.
Descoping #
To keep things simple for the first iteration, I’m going to restrict the scope as follows:
- only listen for DNS queries over UDP (no TCP, DNS over HTTPS, etc …)
- only import the set of blacklisted domains at application boot
- only handle a subset of queries (
Type A, Class IN, Recursion Desired: 1
) in the sinkhole - only handle DNS packets conforming to the original specification in the sinkhole, meaning that their size is 512 bytes and eDNS is not supported
Whenever the sinkhole cannot handle a query (bullets 3.
and 4.
), the application will forward it to the upstream DNS resolver instead.
Design #
+---------+ +-----------+ +-----------+ +-------------+
| Client | | UDPServer | | Sinkhole | | DNSResolver |
+---------+ +-----------+ +-----------+ +-------------+
| ----------------------\ | | |
| | at application boot |-| | |
| |---------------------| | | |
| | | |
| | import blacklisted domains | |
| |--------------------------- | |
| | | | |
| |<-------------------------- | |
----------------------------\ | | | |
| loop: for every DNS query |-| | | |
|---------------------------| | | | |
| | | |
| send DNS query | | |
|----------------------------------------------->| | |
| | | |
| | can you resolve this DNS query? | |
| |-------------------------------------------------->| |
|----------------------------------------------\ | | |
|| if: DNS query relates to blacklisted domain |-| | |
||---------------------------------------------| | | |
| | | |
| | yes: DNS response with non-routable address | |
| |<--------------------------------------------------| |
| | | |
| send DNS response | | |
|<-----------------------------------------------| | |
| -------\ | | |
| | else |-| | |
| |------| | | |
| | | |
| | no | |
| |<--------------------------------------------------| |
| | | |
| | can you resolve this DNS query? | |
| |----------------------------------------------------------------->|
| | | |
| | | | ...
| | | |----
| | | | |
| | | |<---
| | | |
| | yes: DNS response |
| |<-----------------------------------------------------------------|
| | | |
| send DNS response | | |
|<-----------------------------------------------| | |
| ------\ | | |
| | end |-| | |
| |-----| | | |
| | | |
Building blocks #
Sinkhole #
The sinkhole is essentially just a domain registry equipped with logic to resolve a DNS query. I’m going to determine which domains should be blocked using the well-maintained Steven Black’s hosts list. When the application will boot, it will load the file contents (previously downloaded locally) into its in-memory registry.
type Sinkhole struct {
registry map[string]struct{} // if there's an entry for a given domain name, then that domain is blacklisted
}
// Register registers a (sub-)domain
func (s *Sinkhole) Register(name string) error { ... }
// Contains checks for the existence of a (sub-)domain
func (s *Sinkhole) Contains(name string) bool { ... }
// Resolve tries to resolve a query, returning a response if the query has been handled, or false otherwise
func (s *Sinkhole) Resolve(query *message.Query) (*message.Response, bool) { ... }
Server #
The server takes care of interacting with the UDP connection and propagating queries to the sinkhole, and then, if necessary, to the upstream DNS resolver. From the application point of view, upstream
is really just a UDP client, so I can abstract its details away using io.ReadWriteCloser
.
type Server struct {
sinkhole *dns.Sinkhole
upstream io.ReadWriteCloser
}
func (s *Server) Serve(ctx context.Context, address string) error { ... }
Setup #
Raspberry Pi #
I’m going to reuse hardware that I already have at home:
- 1 Raspberry Pi Model A, first edition
- 1 Power supply
- 1 Ethernet cable
- 1 MicroSD card (8Gb)
I’m going to make a fresh installation of the Raspberry Pi OS Lite using Raspberry Pi Imager, with the following steps:
- Insert the MicroSD card into my laptop
- Launch Raspberry Pi Imager
- Choose
OS > Other > Raspberry Pi OS Lite
(because I don’t need a desktop environment) - Click on the cogwheel button, then
Set hostname
andEnable SSH
- Flash the MicroSD card
- Insert the MicroSD card into the RPi
Next, I’m going to:
- Connect the RPi to my home router via the Ethernet cable
- Connect the RPi to its power supply
- Assign a static IP address to the RPi in my home network
I can now deploy it by running
RPI_USER=pi make generate-service # generates a systemd service pointing to an executable in the user's home directory
RPI_HOST=raspberrypi RPI_USER=pi make deploy # copies the required files to the RPi via scp
Once the files have been deployed, I’m going to ssh
into the RPi and run
sudo ./install.sh
Once it completes, I have an enabled systemd
service that I can finally start it with
sudo systemctl start sinkhole.service
Finally, I can tail the service logs by running
sudo journalctl -f -u sinkhole.service
Appendix: DNS Message format #
DNS uses a single message format to represent both queries and responses.
Section | Length | Type | Purpose |
---|---|---|---|
Header | 12 Bytes | Header | Packet information |
Question | Variable | List of Questions | One or more questions indicating the domain(s) and record type |
Answer | Variable | List of Records | The relevant records of the requested type |
Authority | Variable | List of Records | A list of name servers, used to recursively resolve queries |
Additional | Variable | List of Records | Additional resource records |
Header format #
Name | Length | Description |
---|---|---|
Identifier | 16 bits | An identifier assigned to queries: a response must have the same ID as its query |
Query Response | 1 bit | 0 for queries, 1 for responses |
Operation Code | 4 bits | Typically 0 |
Authoritative Answer | 1 bit | Set to 1 if the responding server is authoritative |
Truncated Message | 1 bit | Set to 1 if the message length exceeds 512 bytes |
Recursion Desired | 1 bit | Should the server attempt to resolve the query recursively if it does not have an answer readily available? |
Recursion Available | 1 bit | Set by the server to indicate whether or not recursive queries are allowed |
Reserved | 3 bits | Reserved for later use |
Response Code | 4 bits | Set by the server to indicate the status of the response |
Question Count | 16 bits | The number of entries in the Question Section |
Answer Count | 16 bits | The number of entries in the Answer Section |
Authority Count | 16 bits | The number of entries in the Authority Section |
Additional Count | 16 bits | The number of entries in the Additional Section |
Question format #
Field | Type | Description |
---|---|---|
Name | Domain Name | The domain name, encoded as a sequence of labels as described below |
Type | 2-byte Integer | The record type |
Class | 2-byte Integer | The class, in practice always set to 1 |
Resource Record format #
All record types share a common header, followed by a section for resource data, whose length depends on the record type.
Header #
Field | Type | Description |
---|---|---|
Name | Domain Name | The domain name, encoded as a sequence of labels as described below |
Type | 2-byte Integer | The record type |
Class | 2-byte Integer | The class, in practice always set to 1 |
TTL | 4-byte Integer | Time To Live, i.e. how long a record can be cached before it should be requeried |
Len | 2-byte Integer | Length of the record type specific data |
Type A Resource Data #
A Type A
resource record describes the IP address associated to a domain, so it only needs a single field of 4 bytes.
Field | Type | Description |
---|---|---|
IP | 4-byte Integer | An IP-address encoded as a four byte integer |
Parsing #
UDP packets are written using Big Endian byte order, so I’m going to use the functions provided by binary.BigEndian
to parse them: looking at how DNS messages are structured, I can simply read fields sequentially.
Note: reading sequentially wouldn’t work when parsing responses with compressed domains, for which one would need to instruct the parser to jump to the Question section, read the domain name, then go back to the previous point; but I’m simply forwarding responses without parsing them, so I can keep my parser simple.
Domain name #
Fields with the Domain Name
type represent domain names (i.e. www.duckduckgo.com
) written as follows:
3www10duckduckgo3com0
trying to make it more readable adding whitespaces:
3 www 10 duckduckgo 3 com 0
Interestingly, there are no dots separating domain name parts: instead, each of them is prefixed by 1 byte indicating the length of the following part (3 for www
, and so on …); the sequence is then terminated by a single 0
byte.