Josef Richberg: DBA of the Day

As the winner of the Exceptional DBA Awards 2010 is announced, we take a moment to recognize last year's winner Josef Richberg. We sent Richard Morris to meet Josef to find out what has happened in the past year, and if life has changed for the Exceptional DBA of 2009.

“Also, if you don’t eat,
sleep, breath, SQL in
some way shape or form,
do not get into this field
as you will not be happy.”

Josef Richberg, who was selected as last year’s Exceptional DBA has more than 16 years of DBA experience and works for HarperCollins where he designs and creates SSIS packages. He became involved with RDBMS around 17 years ago, worked with Informix then progressed to Sybase and worked for their Professional Services division. He taught at Sybase University before moving to Microsoft SQL Server when he realised Sybase was, as he puts it ‘no longer a viable player in the RDBMS arena’.

Josef lives in Old Forge Pennsylvania, just a few minutes from Scranton where he grew up. He is married and has a 9 year old son.

RM:
Would you say your working life changed since you won the Exceptional DBA of the Year Award? Why do you think it’s important the industry recognises the work DBAs do? And what has given you most satisfaction?
JR:
Actually, my working life has changed very little since I won the award. DBAs are like specialists and therefore few people understand what we do. How can you appreciate someone winning an award if you don’t understand what it is for?

The Exceptional DBA of the Year Award means a great deal to me, because it means I have been identified by the community of my peers as someone who stands out.

Having won the award has opened up new doors for me to give back to the community. I’ve been doing this by giving presentations through PASS, connecting in the SQL twitter community, and reaching a larger audience with my blog. It also allows me to continue my learning, not in a vacuum, but in a solid community (twitter sql community and PASS). This has given me the greatest satisfaction.

RM:
How much does trying to be an exceptional DBA matter to the job you do?
JR:
I don’t think that applies to your job as much as a result of personal drive. I always do my job to the best of my ability. Being an exceptional DBA comes out of who you are, which becomes evident by the things you do.
RM:
One of the aspects of being a DBA is that you have to absorb knowledge pretty quickly. How do you tackle the problem of understanding a technology which you haven’t seen before?
JR:
I read as much as I can about the technology and then try to solve real problems with it. As an example is the new programming language I am trying to learn, Erlang.

There are only 3 books published on the subject, so I bought them all. I will always buy more than one book on a given subject, if possible, to give me alternate perspectives. I find it easier to learn this way.

I then need some problem to solve in order to bridge the gap between reading about something and learning it.

RM:
Although you’re just starting the language is there anything you’ve found difficult to work in Erlang?
JR:
Everything! It is a functional language, so the concepts are total different from object oriented. As an example all variables are immutable. Functions are objects as well and can be passed into other functions. In most other languages to create a counter for, say a loop, I just create a variable and add to it:

In Erlang I would create a function that checks to make sure I have not crossed my limit (10) and call it recursively.  Here is an example from one of the 3 books published on the language (Erlang and OTP in Action)

Elegant, but something my brain is having issues with.  Couple that with the fact that there are only 3 books published on the subject make it difficult for me to get my head around it.  I am trying to find a few problems that I can solve using the language, but nothing yet.

RM:
How do you stay on top of new emerging technologies?
JR:
Believe it or not, the SQL community on twitter is a vast source of knowledge on both sql and other technologies. Many of us in that community have varied backgrounds and sql is more like a conjunction. I myself started as a Sybase DBA and Java programmer. I have learned about networking, SANS, and dare I say it NoSQL from that same sql twitter community. I read the blogs of my contemporaries. I read the blogs of anything that catches my eye. I started learning Erlang because of a small article I saw on Slashdot. I read a web article (again thanks to twitter) about how LinkedIn uses a variety of technologies including something called a ‘Bloom Filter’, so I went to Wikipedia to look up what that is. Each piece of information leads to another and another.
RM:
Good heavens! You trust Wikipedia?
JR:
I consider it a source of information, just like any other. I have found it reliable and useful. I have been able to validate the information by looking up things I already know (which I sometimes use for pictures or to provide a more concise way of explaining something in a presentation).
RM:
Your SSIS site is popular among the SQL Server cognoscenti, why did you decide to share your knowledge, in this way and what do you get out of it?
JR:
There are only a few ways that knowledge can be shared. One of them is through a book and I haven’t been offered a publishing contract just yet! The other way is a blog, which unlike a book is a dynamic environment.

I’m discovering new things about SSIS all the time. I can publish to it as soon as I have something I think will be useful to others.

There are a number of things that I get out of writing this blog. I get the opportunity to support the community. I get feedback which I learn from, either by refining the technique or knowledge in the post or learning that my technique or test was flawed in some way.

This is part of my philosophy as it were. I learn from someone who knows, improve upon it if I can, and pass it on to others who are looking to learn.

RM:
Are there SQL Server issues that crop up again and again from users of your site?
JR:
There are two issues that seem to keep cropping up. The first is importing files (images or documents) into sql server and the other is about the Enhanced Threading Framework or ETF.
RM:
Ah, that’s my next question. What is ETF, how did you come to develop it and was there a specific problem were you trying to overcome?
JR:
It was designed to solve a specific problem. I work at HarperCollins Publishers where there is a surprising amount of data about books. Data about who sold which book to whom at what price and what quantity is rather important to sales people. There was an existing process to do this, which took over 40 hours. Needless to say this could only be done on a weekend and if anything happened during that weekend, the process was not run and the data became stale. Add to this the fact that the application needed an overhaul and would time out often. This meant the specific system used to display this information was all but forgotten.

I started looking at SSIS to solve the problem of time. When data changes daily a, weekly snapshot is not really worth anything. When I started breaking apart the process I really wanted to use the parallel features SSIS had to offer. I love parallel programming. I used to look for ways to use threads when I programmed in Java. It was from this that the ETF was born.

The Enhanced Threading Framework is a way to set up your SSIS package to more efficiently do work in parallel. The concept is based around the ‘Producer’/’Consumer’ model of work distribution.

You define a ‘unit of work’. This can be anything from a SQL statement to a file that needs to be ingested and processed. This information is stored in a SQL Server table, with one row for each defined unit of work. This table acts like a queue. This would be a pre-processing phase.

You create a stored procedure that pulls the record from the table/queue. I have examples of how the SQL statement that is used to extract that single row of work should be written to allow for maximum concurrency. The procedure to pull the work also has additional information to let the engine know if there is anything to process.

You then define all the necessary SSIS objects required to process that unit of work. This combined with the procedure to check the queue for work makes up the ‘Engine’. SSIS runs each non-connected object as a separate thread. So if you group all of the necessary processing (linking objects together within an Engine structure) and then have say 4 engine structures, you will run that processing 4 at a time. If one of those engines has a piece of work that takes an unusual amount of time, it will not request another unit of work until the one is currently complete. This framework provides a more robust system then simply parsing up work into say 4 groups and having each engine work on its share.

As an example I have one SSIS package which needs to import .pdfs into a SQL Server table.

  1. Pre-processing: I need to ingest a set of files from multiple directories so I read in the full path of every file. (I use a C# script to do this).
  2. Extract a unit of work: I have a stored procedure that extracts a single file path and two other pieces of information pushed there by the pre-processing stream.
  3. Engine: I have a set of SSIS objects that imports the file, creates a hash, validates the record is unique, pulls about a half dozen additional pieces of information from other SQL tables and keeps a counter of the type of file.
  4. Go to step 1 until no more work.
JR:
I would love for Microsoft to pick this up and create visual components that make tying this together a little easier. An ‘Engine’ component that works like a Foreach Loop container, you drop in all of your objects that are needed to process a unit of work and select from a drop down the method you will use to get that unit of work (procedure, ADO object, etc).

In the mean time I am working on a few enhancements. One of these enhancements is to be able to dynamically start/stop the number of engines. I have run into situations where the load process runs into business hours and to prevent server strain. Under the current design you either let it go or completely stop it until there is enough processing power to continue. It would be nice to be able to slow it down, allowing it to complete during business hours and not have to wait for the next nightly run. This will be done either with Service Broker or using a SQL Server table where commands are passed. The solution needs to be efficient and easy to use (which is another word for elegant).

Another enhancement that I alluded to in a recent post is to use a local process and named pipes as a queue instead of a sql server table. The engines no longer use a stored procedure to extract information a SQL Server table, instead I use a Script Component to request that information from a named pipe, which is being supplied by a process started within the pre-processing phase of the SSIS package. I am curious to see if having a queue local to the SSIS box would provide sufficient improvement over the thousands of round-trip requests to a remote SQL Server box (we house SSIS on independent boxes so SQL Server is always remote).

RM:
Which aspects of being a DBA get you most excited and what most depresses you?
JR:
Performance and Tuning and innovation excite me most. Developing something brand new (like my Enhanced Threading Framework) is exciting. Taking an existing process and dramatically improving it is another reason why I do this job.

What depresses me is a little more involved. I don’t like saying “I told you so”, I’d have preferred to simply solve the problem when it first came up. Identifying a problem, but not being given the proper tools, be it money, time, or resources to solve it.

RM:
Who would you say has taught you most about being a DBA? Your Zen master, if you like?
JR:
John McVicker. He was a Principle Consultant at Sybase Professional Services where I worked in the same group as a Senior Consultant.
RM:
Do you have any advice for up-and-coming DBAs?
JR:
Use your curiosity as a tool. Being a DBA is no longer just about the SQL software or database design. It encompasses servers, SANS, networks, and Microsoft’s CLR. Read up on new technologies, even if you don’t think they are relevant right now as they might become relevant. Also, if you don’t eat, sleep, breath, SQL in some way shape or form, do not get into this field as you will not be happy.
RM:
It’s said that some DBAs live chaotic lives, in that they always seem to be on call. Is it the same for you and have ever considered a change of career?
JR:
This is one of the first jobs that I am not on call 24/7. I work within the data warehouse group, but don’t confuse that with the traditional sense of a data warehouse with star schemas and the like. This is simply servers that contain large volumes of data. We process the previous day’s daily transactions and provide them for the business. There are business critical applications that rely on this “day-after” data, which means if a job fails for some reason we will get a call, otherwise we can get a good night’s sleep.

I have thought of changing careers and only one comes to mind: bakery. I love to bake and it has gone from a hobby to a passion. I told my wife that if we ever won the lottery I would open a bakery. It’s one of the few professions were there is little disagreement; a chocolate chip cookie is a chocolate chip cookie and no one counts the number of chips.

Until that time, this is one of the few professions that enable me to create.

RM:
How would you make money from your skills if you weren’t in the job that you now?
JR:
Living in a small town as I do, to continue as a DBA I would most likely have to find a telecommuting job. If that were not available I would slide over to being a programmer. Travel is not an option, not until our son is older as he’s only 9 years of age.
RM:
You were diagnosed with Multiple Myeloma which affects the plasma cells and indirectly causes calcium to be extracted from the bones, causing them to fracture. Has the illness affected the way you work or the way you think about your job, and what’s your latest prognosis?
JR:
I was diagnosed in the middle of September of 2009, due to a fracture in my lower spine and one in my upper. That is not how the illness sprung up, though because no one can tell me how long I have had the illness. I spent 30 days in the hospital and 15 days in rehabilitation, which put me back home the first week of November. At this point I was wearing a brace and moving around with a cane. I had very limited mobility. HarperCollins Publishers has been incredibly supportive and have allowed me to work from home since January 2010 and continue to be supportive as I undergo various rounds of treatment. The prognosis is very good. I am currently in remission and I am planning on undergoing Stem Cell Replacement therapy in the near future.

The illness has caused me to think about my job and my family. We plan a few more vacations. It is a difficult balancing act. You want to plan for the future, yet when the future might come sooner than anticipated, you take a little more time from the future and use it in the present.

I have decided that while my job is important to me, I will not allow it to affect me as much as it has in the past. This is simply to keep my stress level down and allow my body the best chance to process the drugs necessary to combat this illness. This is a little difficult as I am passionate about my job and it is difficult to let things go, but I am working on that.