Nick James

How to generate test data using factories: Rosie + Faker

Introduction

Over the last couple weeks at work, we have been focused on revamping our automated testing process to create a better developer experience and be more reliable. Part of the revamp was using Rosie to define data factories and Faker to generate random test data. I will quickly outline the problems we were having so you can see why this new approach is so much better.

Before switching to data factories, all of our integration tests relied data from our beta test environment, or data created from fixtures setup to a local database. This resulted in less reliable test runs as someone could alter the data a test expected to exist and cause unnecessary tests failures.

As for our data for our fixtures to setup a local database, they were defined in JSON files like so.

{
  "id": "4lk3j5lj4l5j44lk5hdgf",
  "first_name": "Bob",
  "last_name": "Doe",
  "email": "bob@test.com",
  "location": {
    "street": "1234 fake st",
    "city": "Realtown",
    "state": "CA",
    "country": "US",
    "zipcode": "56789"
  }

If we wanted to create 12 test users, we needed 12 separate objects which caused a slew of problems. If the data schema changed, all 12 instances needed to be manually updated. If a typo was added to the data file, a field may not have any data. Lastly, invalid data could be assigned to fields like the following

{
  "id": "4lk3j5lj4l5j44lk5hdgf",
  "first_name": "Bob",
  "last_name": "Doe",
  "email": "bob@test.com",
  "location": {
    "street": "1234 fake st",
    "city": "Realtown",
    "state": "CA",
    "country": "US",
    "zipcode": "abcde"
  }

and it would go into the database and crash the test app. abcde is not a valid zip code.

Lastly, when creating lots of test data, you the data to vary in different ways. Sometimes those differences are slight such as changing a flag from true to false to indicate an employee is no longer active. It can be very hard to tell what the differences are when you have dozens of very large objects with only a few values changed between them.

Needles to say, we needed a better approach.

Data factories to the rescue

Switching to data factories solved all of the problems listed above and more!

  • It is much easier to keep test data in sync with schema changes since only the factory needs to be updated.
  • Test data is guaranteed to be valid as long as valid defaults were used in the factory function.
  • Keeps code very DRY when creating test data
  • When used in automated tests, it is very easy to pinpoint the data needed for the tests since those values will have overrides.

Introduction to Rosie

Now that you know some of the problems we solved by using Rosie, let's jump right into an example. Oh and before I forget, you can find the official docs here.

Lets' assume we have a blog post schema that has a id,  title, tags, author, body, and status. For this example, let's also say that the status field is limited to one of the following values: draft, private, trash, or published.

Here is an example blog post.

{
    id: '123456789',
    title: 'Award winning blog post on making tacos',
    tags: ['featured', 'award', 'cooking'],
    author: 'Nick James',
    body: 'Tacos are...',
    status: 'published'
}

In order to create such an object with Rosie, you first need to install it

npm i rosie

Next you will want to use the Factory singleton to define an article object and how to build it.

The first thing to do is to define how to build such an object using the Factory singleton.

const { Factory } = require('rosie');

Factory.define('article').attrs({
	id: 123456789,
	title: 'Award winning blog post on making tacos',
	tags: ['featured', 'award', 'cooking'],
	author: 'Nick James',
	body: 'Tacos are...',
	status: 'published'
}); 

Now that we have told Rosie how to build an article object, lets go ahead and add the following

const article = Factory.build('article');

If you were to print out the article variable, you would get something like this

{
	id: 123456789,
	title: 'Award winning blog post on making tacos',
	tags: ['featured', 'award', 'cooking'],
	author: 'Nick James',
	body: 'Tacos are...',
	status: 'published'
}

Awesome! Now we can create as many article objects as we want simply by calling Factory.build('article').

But what would happen if we called Factory.build multiple times like so

const article1 = Factory.build('article');
const article2 = Factory.build('article');
const article3 = Factory.build('article');

As you might have guessed, we would end up with three different article objects with the same values. So then how do we create an article with a status of draft or a different author?

Rosie makes overriding values quite easy. We simply pass an object as the second argument to the build method with the values we want to provide. That means a draft article can be created like so.

const draft_article = Factory.build('article', { status: 'draft'});

All the other values would use the defaults provided when we defined how to build the article object.

Generating dynamic data with Rosie

So far we know how to define how objects should be created and how to override the default values given. However, for some fields in an object, we want the value to be unique and supplying a unique override value is error prone and annoying. A good example of such a field would the id field from our article example above. If we are storing the articles in a database, we probably want them to be unique.  

Yet again, Rosie has got our backs. If all you need is a simple number id, you can move the id field out of the .attrs() call and into a .sequence().

Factory.define('test')
.sequence('id')
.attrs({
	title: 'Award winning blog post on making tacos',
	tags: ['featured', 'award', 'cooking'],
	author: 'Nick James',
	body: 'Tacos are...',
	status: 'published'
});

For the first call to .build(), id will be equal to 1, the second time it will be 2, so on and so forth.

If number values aren't what you are looking for, you can supply your own function to generate data however you like. The return value of the function will be assigned to your property. In the example below, I decided to go with UUID

Factory.define('test')
.attrs({
	id: () => uuid(),
	title: 'Award winning blog post on making tacos',
	tags: ['featured', 'award', 'cooking'],
	author: 'Nick James',
	body: 'Tacos are...',
	status: 'published'
});

Rosie + Faker

Cool cool cool. Now that we know how to define factory functions and generate dynamic data, I can finally introduce Faker.

Faker is a fantastic library for generating test data using methods that are organized in a sensible way. Some of the types of data you can generate are addresses, names, emails, dates, etc. For a comprehensive list, be sure to checkout the documentation here.

The primary reason I use Faker is to generate a bit of randomness in my test data. As an example, if I use the factories to generate a bunch of test employees, it is highly unlikely that each person will have the same name, age, address, etc.

Still using the article example, we combine everything we have learned so far and convert it over to using faker like so

const status = ['draft', 'private', 'trash', 'published'];
Factory.define('article').attrs({
	id: () => faker.random.uuid(),
	title: () => faker.lorem.sentence(),
	tags: () => [faker.lorem.word(), faker.lorem.word()],
	author: () => faker.name.findName(),
	body: () => faker.lorem.paragraph(),
	status: () => status[Math.floor(Math.random() * status.length)]
});

You will notice that I have converted every property to using the function approach to generating data. This is so each new article has a different value. Secondly, I am randomly assigning a valid status value using some simple math.

Now if I want to generate unique articles, I can call Factory.build('article') without any overrides and I will get a unique object. Plus, I can add overrides when I need just as we did above!

I hope this was helpful, and in my next post I want to show how I use these data factories in real integration test to create data in the database.

Author image
About Nick James