23 May 2014
23 May 2014

Building a data lineage visualization tool

At Redgate we’ve recently started a project to build a data lineage visualization tool for Hadoop. This post is an assorted collection of some of learnings and experiences so far.

Tech choices

Part of our remit in Ventures is to explore new platforms, so we decided to build a web app (with javascript) rather than a desktop application. We decided to use d3 as it is powerful and has lots of features built in. This turned out to be a great choice as it allowed us to try out lots of different things quickly to get feedback and kept our codebase small (less code means less bugs).

Exploring ideas

Instead of building a single prototype we started by producing multiple separate experiments to show different ideas. Each idea was a single page and they didn’t interact. This had some advantages like allowing us to work independently, so that the team was producing ideas in parallel with no code merges or interaction bugs. The idea was that we would end up with lots of ideas and we could then combine the best elements together into a final design. However we found that it was hard to judge elements in isolation, when we asked for feedback people would get hung up on one thing that they liked and say “this one is the best” rather than telling us with elements were good. Additionally, sometimes there would be no ‘best’. Sometimes the most suitable solution depended on the structure of the input data, or the exact task that the user was interested in.

We realized that we would need to see the different combinations, and even give our users some options. It was time to combine ideas into one display. However we didn’t want to lose all the advantages of working independently. Rather than adding zillions of feature flags to one giant file we took the time to make an api that allowed us to keep different features separate. It was a classic case of going slow to go fast. Even though we are building a prototype, and the code may not survive for long it was worth taking a few hours to get the design right. Now we have something flexible and can quickly try new ideas.

 Design details

We have check boxes and radio buttons to control which features are enabled and are using event-emitter.js to give us integration points in our code. We are careful to ensure that the order of enabling features doesn’t matter and that they don’t interact in unexpected ways. For example, multiple features can ‘hide’ some elements of our visualization, hiding is additive. Everything is unhidden once in the main code and no feature is allowed to unhide an element as this would interfere with other features.


Display options and “magic numbers” are pulled out into a separate file so that designers can easily play with them.

 Getting good feedback

The first time we showed our prototype around, it got a lot of criticism. Some of it was superficial, “I don’t like that color, can you make this bigger”. After a little while explaining, yes, we know it’s ugly, but is it useful? How can we make it more useful? We realized that the ugliness was distracting and preventing us from getting good feedback. We made it look pretty and were able to have discussions about what users would do with it, and if it was useful. Having all the features enableable from within the interface in single clicks also helped. We would be told “I want x” and be able to turn x on and ask “now what do you want?”. We didn’t build a minimum viable product, but we did get good feedback (and by building on d3 we got it quickly).

Distractingly ugle


Good enough to get feedback

Share this post.

Share on FacebookShare on Google+Share on LinkedInTweet about this on Twitter

Related posts

Also in Blog

Down Tools Week 2017

Once a year, we hold Down Tools Week at Redgate. It’s our version of a hack week – a chance to put down the day job and spend a week working on something completely different. It’s an opportuni...

Also in Software development

How might classification and better documentation improve data safety?

In this post, we imagine how auto-classification of data can be used to build better documentation that helps you trust that your organization can use data without posing a risk or compromising regu...

Also about Experience Reports

Rapid software testing

After having attended the ISTQB Foundation Level last year, I was looking for a training course from which I could learn more practical skills relevant to my day-to-day role at Redgate. In particular,...