Red Gate forums :: View topic - Weighted List Generation
Return to www.red-gate.com RSS Feed Available

Search  | Usergroups |  Profile |  Messages |  Log in  Register 
Go to product documentation
SQL Data Generator 2
SQL Data Generator 2 forum

Weighted List Generation

Search in SQL Data Generator 2 forum
Post new topic   Reply to topic
Jump to:  
Author Message
Jeffmn



Joined: 14 Dec 2011
Posts: 1

PostPosted: Wed Dec 14, 2011 6:03 pm    Post subject: Weighted List Generation Reply with quote

Can someone shed some light on how the weighting works in a weighted list generation?

I wrote a custom weighted list generator that imports a bunch of values from a .csv files and builds the generator xml. That .csv file as several thousand items that will ultimately be populated into a database containing many million rows.

My question is how is the weighting computed? On one hand the documentation says I need to express each item on the list as a percentage with the values totalling 100%, but I found elsewhere in the documentation those values are expressed in ratios.

I tried expressing each of the items as a percent but Data Generator sets the minimum value at 1 so that's out of the question given I've got 1000 items in my list that I want to weight.

I then changed all the values to be different larger numbers ranging from 1 up to 20,000 or so. How does this translate when my 1000 row list is used to populate a 10 million row table? I'm trying to understand what the individual weight values mean in the 10 million row databases I'm populating.
Back to top
View user's profile Send private message
james.billings



Joined: 16 Jun 2010
Posts: 1121
Location: My desk.

PostPosted: Thu Dec 15, 2011 9:18 pm    Post subject: Reply with quote

Hi there,

A while ago we had a similar query about the weighted list generator where even at a simple level it behaved unexpectedly - for instance if you tried to generate 10 rows of values x, y and z on a 20, 20, 60 basis, you'd expect to get 2 x, 2 y, and 6 z. But it would often not produce this.

I queried it with the developers and apparently it's working as designed, in their words: "The values are generated at random using the weightings. Not generated in the weighted ratio then randomized."

As for how it works- it seems both ratios and a percentage should be feasible, as the popup help states:

Quote:
For example, if you enter 2 for value Yes and 1 for value No, Yes will occur twice as many times as No in the selected column.
To specify as percentages, ensure all the weight ratios add up to 100.


The new version of Data Generator has an option to use a Python Script as a generator, and they were kind enough to produce a sample that would lead to a more predictable result, which I've pasted below. Hopefully it's of some use although I see you're actually working with a CSV file of values, so I'm not sure how easily you'll be able to convert it across.

Code:
#Python script is generate strings in a strict ratio
__randomize__ = True

weightedStrings = (('xxx',2), ('yyy',2), ('zzz',6))

def main(config):
    n_rows=config["n_rows"]
    return list(next_string(n_rows))

def next_string(n_rows):
    for i in range(n_rows):
        for item in weightedStrings:
            string = item[0]
            weight = item[1]
            for i in range(weight):
                yield string
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic All times are GMT + 1 Hour
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group