Performance Considerations: Binary Serialization Efficiencies

We were meeting with a client recently who was experiencing some major issues with performance with one of their key applications. Their staff had identified the issue as being a disk performance issue – there was simply a ton of data to read and write to disk and it was causing a huge bottleneck in performance. Understand that there was no database involved – the analysis process consisted a series of small command-line applications that would take a CSV file as an input, massage the data, and output a CSV file that would then be used as the input for another command-line application. Although each individual app was relatively small, the entire system was quite large when you put them altogether.

Most of our discussions revolved around bringing in higher speed single drives or implementing a RAID array to help with IO performance because they were looking for a quicker fix than rebuilding the application – but I started to think about how to increase application performance without having to complete a major rewrite because that’s way more fun to think about than a hardware solution (to me at least). Since their data was coming in as a CSV file, I figured they were reading the numbers as string, converting those numbers to integers, and storing them in a data structure of sorts. So I began wondering what kind of performance gain they could get if the data was read in once and serialized back and forth using .NET binary serialization.

My test consisted of generating a byte array of various item lengths and writing that file as a CSV file as well as using the .NET binary serializer to write the binary representation to a file as well. Then I read that file back into memory 25,000 times and calculated the operations per second for each method. I also recorded the file size (in KB). My first run only included reading the text into a string array – it did not include any casting from a string into an int. The findings are as follows:

Items	Size (Text)	Size (Binary)	Ops/Sec (Text)	Ops/Sec (Binary)
100	359	128	14.03	9.07
200	710	228	12.54	11.07
300	1071	328	10.37	10.92
400	1421	428	9.69	11.06
500	1773	528	8.62	10.90
600	2129	628	7.64	10.93
700	2484	728	7.01	11.17
800	2882	828	6.51	11.13
900	3218	928	6.15	10.91
1000	3592	1028	6.04	10.85
2000	7154	2028	3.61	10.83
3000	10719	3028	2.54	10.65
4000	14305	4028	1.99	10.49
5000	17793	5028	1.55	9.53
6000	21353	6028	1.31	8.16
7000	25046	7028	1.14	8.25
8000	28557	8028	0.97	9.24
9000	32083	9028	0.89	8.04
10000	35709	10028	0.71	9.28
11000	39221	11028	0.66	9.12
12000	42789	12028	0.61	9.19
13000	46389	13028	0.57	8.85
14000	49953	14028	0.53	9.07
15000	53482	15028	0.50	8.99

As you can see, the text import quickly degrades – I graphed it out and it follows a curve that degrade exponentially. File sizes tend to follow a linear trend with the binary size about 30% of the size of the text file size. For smaller number of items (Under 300) it looks like the text import works faster than the binary import.

Next, I tried to see what the performance would like with parsing each text element into a byte:

Items	Size (Text)	Size (Binary)	Ops/Sec (Text)	Ops/Sec (Binary)
100	359	128	11.20	11.06
200	719	228	8.55	10.94
300	1065	328	6.67	11.02
400	1434	428	5.60	11.03
500	1798	528	4.84	11.00
600	2153	628	4.21	11.07
700	2512	728	3.76	11.00
800	2864	828	3.41	10.96
900	3210	928	3.12	10.95
1000	3554	1028	2.89	10.95
2000	7176	2028	1.61	10.78
3000	10761	3028	1.11	10.64
4000	14253	4028	0.86	10.14
5000	17820	5028	0.68	9.61
6000	21340	6028	0.57	7.87
7000	24936	7028	0.50	9.45
8000	28528	8028	0.43	8.08
9000	32084	9028	0.38	8.03
10000	35638	10028	0.33	8.09
11000	39234	11028	0.30	9.31
12000	42886	12028	0.28	9.35
13000	46377	13028	0.26	9.24
14000	49991	14028	0.24	9.14
15000	53499	15028	0.23	8.99

When performing type casting, the text-based approach is only faster when there are about 100 items and it tails off even more dramatically. I would venture a guess that if I was deserializing a more complex object instead of a primitive type, that the cost of casting would be even greater. The binary approach, on the hand, seems to hold fairly steady regardless of the amount of items with which it is dealing (through both tests even).

Just thought it was kind of interesting if you were ever wondering about binary serialization performances statistics (albeit only for integer arrays).

Register for Simple Talk

Performance Considerations: Binary Serialization Efficiencies

About the author

Damon Armstrong

Damon's contributions

Articles

Books

Top topics

Damon's latest contributions:

Introduction to Vue.js with a Single Page Application (SPA) in Visual Studio

WPF Menu Displays to the Left of the Window

SharePoint’s Way of Saying Your String is Too Long

Recommended

About the author

Damon Armstrong

Damon's contributions

Articles

Books

Top topics

Damon's latest contributions:

Introduction to Vue.js with a Single Page Application (SPA) in Visual Studio

WPF Menu Displays to the Left of the Window

SharePoint’s Way of Saying Your String is Too Long