Faking EF Core Data Access

Entity Framework Core - FAKED!

Unit tests are not supposed to access external resources - such as databases. There are a few ways to achieve this: using in-memory databases, mocking the data access layer, or faking it. Most teams I work(ed) with use either in-memory databases or mock the data access. If you do too, that's fine. There is nothing wrong with that. In this post I want to present an alternative: faked data access. And I will do so using Entity Framework Core (EF Core) as an example.

But first: let's look at the three alternatives in a bit more detail. I want to point to a few limitations with in-memory databases and mocking.

In-Memory Databases

Whenever I can, I use in-memory databases for testing. Often I can't, because:

  • there is no in-memory alternative for the database system we use in production
  • in-memory versions do not support all features of the production database (stored procedures, triggers, etc.)
  • wiping an in-memory database is too complex and restarting it between tests is slow

Mocking Data Access

I used to mock a lot more, but have since moved away from it. I prefer to test against the real implementation or write fakes instead. Mocking has a few disadvantages:

  • Mocked test doubles can become cumbersome to set up
  • Mocks can lead to brittle tests, as they are tightly coupled to the implementation
  • Mocks tend to become overly specific
  • Mocks give you a false sense of security

Mocks are fine, if you use them adequately. But I have seen (and written!) test code that looked something like this:

mockedDataRepo.Setup(da => da.GetUserById(1)
    .ReturnsAsync(new User { Id = 1, Name = "Test User" });
mockedDataRepo.Setup(da => da.GetUserById(2)
    .ThrowsAsync(new UserNotFoundException());
mockedDataRepo.Setup(da => da.AddUser(It.IsAny<User>())
    .ReturnsAsync((User u) => { u.Id = 2; return u; });
mockedDataRepo.Setup(da => da.UpdateUser(It.Is<User>(u => u.Id == 1)))
    .ReturnsAsync(true);
mockedDataRepo.Setup(da => da.SaveAsync())
    .ThrowsAsync(new ConflictException());

// some more test code here

mockedDataRepo.Verify(da => da.UpdateUser(It.IsAny<User>), Times(2));
mockedDataRepo.Verify(da => da.SaveAsync(), Times.AtLeastOnce);
...

I think you get the point: if you need to do this every test, it gets tedious quickly. This kind of code is also very error-prone, as you need to keep the setups in sync.

Imagine a scenario where you add several Users, query several Users by ID, update some of them, etc. Sometimes the mocking setup gets so complex that it is hard to keep track of what is returned when and why. Suddenly setting up mocks makes up >50% of the time you spend writing tests.

And then you want to change the implementation? You would have to do it all over again!

Another tricky part about mocks: what if you have a bug storing an entry, but your mock returns a valid result? For example:

mockedDataRepo.Setup(da => da.AddUser(It.IsAny<User>())
    .ReturnsAsync((User u) => { u.Id = 2; return u; });
var addedUser = sut.AddUser(new User { Name = "New User" });
addedUser.Id.Should().Be(2);

The test passes, but what if the actual implementation does not set the ID correctly?

Faking Data Access

A fake is a type of test double that has a working implementation, but is simplified and not suitable for production. Here is an example that I used in similar form on various projects:

public class FakeDataRepository : IDataRepository
{
    protected readonly InternalStore store;
    
    protected FakeDataRepository()
    {
        store = new InternalStore();
    }
    
    // use this overloaded constructor to share one store between multiple faked repositories
    protected FakeDataRepository(Store store)
    {
        this.store = store;
    }
    
    public Task<T?> GetAsync<T>(Guid id, CancellationToken cancellationToken) where T : IEntity
    {
        return Task.FromResult(store.Get<T>(id));
    }
    
    public IEnumerable<T> Filter<T>(Predicate<T> predicate) where T : IEntity
    {
        return store.GetMany<T>(predicate.Invoke);
    }
    
    public void Add<T>(T entity) where T : IEntity
    {
        store.Add(entity);
    }
    
    // ... other methods to update, delete, save changes, etc. ...
    
    public class InternalStore
    {
        private readonly Dictionary<Guid, IEntity> entities;
        private readonly Dictionary<Guid, IEntity> updatedEntities;
        private readonly Dictionary<Guid, byte[]> maxTimestamps;
        private List<IEntity> deleted;
        
        public void Add(IEntity entity)
        {
            TrySetTimestamp(entity); // throws exception when timestamp is invalid
            entities.Add(entity.Id, entity);
        }
        
        public void Update<T>(T entity) where T : IEntity
        {
            if (updatedEntities.ContainsKey(entity.Id)) return;
            updatedEntities.Add(entity.Id, entity);
        }
        
        public void Delete(Guid id)
        {
            deleted.Add(entities[id]);
            entities.Remove(id);
        }
        
        public T? Get<T>(Guid id)
            where T : IEntity
        {
            return entities.TryGetValue(id, out var entity) ? (T)entity : null;
        }

        public T? Get<T>(Func<T, bool> predicate)
            where T : IEntity
        {
            return entities.Values.OfType<T>().SingleOrDefault(predicate.Invoke);
        }
        
        // simulates a Save() operation: entities are moved from "updated" to "entities" 
        // collection etc.
        public void Flush()
        {
            foreach (var tsEntry in maxTimestamps)
            {
                var maxTicks = CalculateMaxTicks(tsEntry.Value);
                var entity = entities.TryGetValue(tsEntry.Key, out var e) ? 
                    e : deleted.Single(de => de.Id == tsEntry.Key);
                var entityTicks = CalculateMaxTicks(entity.Timestamp);
                if (maxTicks > entityTicks)
                {
                    throw new DbUpdateConflictException();
                }
                
                var newTs = TrySetTimestamp(entity);
                if (updatedEntities.ContainsKey(tsEntry.Key))
                {
                    updatedEntities.Remove(tsEntry.Key);
                }

                MarkEntityAsModifiedWithTimestamp(entity, newTs);
            }
        }
        
        // ... other methods to query entities, get lists, handle timestamps, etc. ...
    }
}

Fakes like the one above are more work to set up initially, but they pay off quickly. And I find working with fakes much more easy and enjoyable than with mocks. That's because you can interact with fakes like with a real repository in your tests:

public class MyTest
{
    private readonly FakeUserRepository repo;
    private readonly MyService sut;
    
    public MyTest()
    {
        repo = new FakeUserRepository();
        sut = new MyService(repo);
    }
    
    [Fact]
    public void Updates_existing_User()
    {
        // Arrange
        var user = new User { Id = Guid.NewGuid(), Name = "Test User" };
        repo.Add(user);

        // Act
        sut.UpdateUser(user.Id, "Updated Name");

        // Assert
        repo.Get<User>(user.Id).Name.Should().Be("Updated Name");
    }
    
    public class FakeUserRepository : FakeDataRepository, IUserRepository
    {        
        public async Task<User?> GetByNameAsync(string name, CancellationToken token)
        {
            return await GetAsync<User>(u => u.Name == name, token);
        }
        
        // ... other user-repo-specific methods ...
    }
}

The above example might not impress you much, but imagine a more complex scenario:

  • several users, orders, products, etc. have to be added at the beginning of the test
  • when updating a range of orders, some have to be found in the database, others have to be created by the sut
  • the database should perform a stored procedure when adding a new order (simple to add a function to the fake)
  • some operations should fail with a conflict exception (handled by the fake's timestamp logic)
  • ...

In this article I am using the repository pattern, because it is a well-known pattern. Not all projects use it, and in at least one project I worked with a different data access pattern, which I preferred.

And a side note to the side note: repositories tend to accumulate a lot of methods over time. If you want to use repositories, consider using a generic repository for basic operations (CRUD), and implement use-case specific repositories for better cohesion and coupling.

Side note over.

Another thing I love about fakes (but this is personal preference): there is no black magic! It is easy to debug the faked code, add logging, etc. All the source code is under your control. I even tend to add unit tests for the fake itself, giving me even more confidence in my tests.

The next time you are about to mock your data access layer, consider writing a fake instead.

There are other components I like to use fakes for. Another one are external APIs, which is a post that will come up soon.

One final note: I used EF Core in this post, because I worked on several projects recently that used it. But I did much the same for other ORMs like JPA / Hibernate.