Building a data lineage visualization tool Redgate Software Dev

At Redgate we’ve recently started a project to build a data lineage visualization tool for Hadoop. This post is an assorted collection of some of learnings and experiences so far.

Tech choices

Part of our remit in Ventures is to explore new platforms, so we decided to build a web app (with javascript) rather than a desktop application. We decided to use d3 as it is powerful and has lots of features built in. This turned out to be a great choice as it allowed us to try out lots of different things quickly to get feedback and kept our codebase small (less code means less bugs).

Exploring ideas

Instead of building a single prototype we started by producing multiple separate experiments to show different ideas. Each idea was a single page and they didn’t interact. This had some advantages like allowing us to work independently, so that the team was producing ideas in parallel with no code merges or interaction bugs. The idea was that we would end up with lots of ideas and we could then combine the best elements together into a final design. However we found that it was hard to judge elements in isolation, when we asked for feedback people would get hung up on one thing that they liked and say “this one is the best” rather than telling us with elements were good. Additionally, sometimes there would be no ‘best’. Sometimes the most suitable solution depended on the structure of the input data, or the exact task that the user was interested in.

We realized that we would need to see the different combinations, and even give our users some options. It was time to combine ideas into one display. However we didn’t want to lose all the advantages of working independently. Rather than adding zillions of feature flags to one giant file we took the time to make an api that allowed us to keep different features separate. It was a classic case of going slow to go fast. Even though we are building a prototype, and the code may not survive for long it was worth taking a few hours to get the design right. Now we have something flexible and can quickly try new ideas.

 Design details

We have check boxes and radio buttons to control which features are enabled and are using event-emitter.js to give us integration points in our code. We are careful to ensure that the order of enabling features doesn’t matter and that they don’t interact in unexpected ways. For example, multiple features can ‘hide’ some elements of our visualization, hiding is additive. Everything is unhidden once in the main code and no feature is allowed to unhide an element as this would interfere with other features.


Display options and “magic numbers” are pulled out into a separate file so that designers can easily play with them.

 Getting good feedback

The first time we showed our prototype around, it got a lot of criticism. Some of it was superficial, “I don’t like that color, can you make this bigger”. After a little while explaining, yes, we know it’s ugly, but is it useful? How can we make it more useful? We realized that the ugliness was distracting and preventing us from getting good feedback. We made it look pretty and were able to have discussions about what users would do with it, and if it was useful. Having all the features enableable from within the interface in single clicks also helped. We would be told “I want x” and be able to turn x on and ask “now what do you want?”. We didn’t build a minimum viable product, but we did get good feedback (and by building on d3 we got it quickly).

Distractingly ugle


Good enough to get feedback

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Blog

How to document multiple SQL Server databases using SQL Doc and PowerShell

You can use SQL Doc's command-line parameters to automate database documentation, but when you try to automate the process of documenting a group of databases on a server, they sometimes don't give yo...

Also in Software development

Dude, where’s my database? Inventory management by Foundry

Foundry is Redgate’s research and development division. We develop products and technologies for the Microsoft data platform. Each project progresses through Foundry’s four-stage product developme...

Also about Experience Reports

Rapid software testing

After having attended the ISTQB Foundation Level last year, I was looking for a training course from which I could learn more practical skills relevant to my day-to-day role at Redgate. In particular,...