Tuesday, 23 December 2025

Memory-Saving Techniques to Boost Performance in ASP.NET Core Web APIs

 When an ASP.NET Core Web API starts slowing down under load, the root cause is often not “CPU is too high” — it’s memory pressure. Rising allocations trigger frequent garbage collections (GC), increased latency, and sometimes a steadily growing working set that ends in restarts or out-of-memory events (especially in containers).

This article walks through practical, high-impact memory-saving techniques for ASP.NET Core Web APIs and ties them together with a real-life style example of optimizing a “Orders” endpoint in an e-commerce system.

Why memory matters in Web APIs

Every request creates objects: DTOs, strings, collections, EF Core tracking graphs, serialized JSON buffers, logs, etc. If your API allocates more per request than necessary, you’ll see:

  • Higher latency (GC pauses)
  • Lower throughput (more time collecting than doing work)
  • Memory spikes during traffic bursts
  • Unstable performance over time (working set growth)

The goal isn’t “use no memory” — it’s to allocate less per request, avoid buffering large payloads, and keep caches bounded so memory stays predictable.

Real-life example scenario: “Orders API” under load

Context: An e-commerce platform exposes:

GET /api/orders/search?customerId=…

Traffic pattern:

  • 200–500 requests/sec during peak
  • Many clients ask for large date ranges
  • A small percentage of customers have tens of thousands of orders

Symptoms:

  • P95 latency jumps from 120ms to 900ms during peak
  • Gen 2 GCs become frequent
  • Memory usage climbs after each traffic spike
  • Occasional container restarts

The “before” implementation (common anti-patterns)

Typical issues:

  • Loading full entity graphs
  • Tracking enabled for read endpoints
  • Materializing full lists in memory
  • Returning massive payloads without pagination
[HttpGet("search")]
public async Task<IActionResult> Search(Guid customerId)
{
var orders = await _db.Orders
.Include(o => o.Items)
.Include(o => o.Payments)
.Where(o => o.CustomerId == customerId)
.OrderByDescending(o => o.CreatedAt)
.ToListAsync();

var dto = orders.Select(o => new OrderDto
{
Id = o.Id,
CreatedAt = o.CreatedAt,
Total = o.Total,
Items = o.Items.Select(i => new ItemDto { /* ... */ }).ToList()
}).ToList();

return Ok(dto);
}

This looks harmless, but at scale it causes:

  • Large allocations for List<>, nested List<>, DTO graphs
  • EF Core tracking overhead (stores snapshots, references, fixup)
  • Bigger JSON serialization buffers
  • Higher GC pressure and latency

The optimized version: memory-saving changes that matter

Below are the highest-impact changes for memory and performance.

1) Always paginate large result sets

If your endpoint can return “all orders,” it will eventually return “too many orders.”

Rule of thumb: enforce a maximum pageSize, even for internal APIs.

[HttpGet("search")]
public async Task<IActionResult> Search(Guid customerId, int page = 1, int pageSize = 50)
{
page = Math.Max(page, 1);
pageSize = Math.Clamp(pageSize, 1, 200);

// ...
}

Why this saves memory: you cap the number of entities/DTOs/materialized rows in memory at once.

2) Use AsNoTracking() for read-only queries

For read endpoints, EF Core tracking is often wasted memory.

var query = _db.Orders
.AsNoTracking()
.Where(o => o.CustomerId == customerId);

Why this saves memory: tracking creates internal data structures for every entity row. AsNoTracking() avoids them.

3) Project directly to DTOs (avoid loading entity graphs)

Instead of Include + mapping after the fact, shape the response in the database query.

var results = await _db.Orders
.AsNoTracking()
.Where(o => o.CustomerId == customerId)
.OrderByDescending(o => o.CreatedAt)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new OrderSummaryDto
{
Id = o.Id,
CreatedAt = o.CreatedAt,
Total = o.Total,
ItemCount = o.Items.Count
})
.ToListAsync();

Why this saves memory:

  • You avoid materializing full Order, Items, Payments graphs
  • You allocate fewer objects overall
  • JSON serialization is smaller and faster

Bonus: this often reduces database load too.

4) Stream results for truly large exports (avoid buffering)

Some endpoints are inherently large (exports, reports). For those, don’t return huge JSON arrays in one shot.

Options:

  • CSV export streamed to the response
  • NDJSON streaming (one JSON object per line)
  • IAsyncEnumerable streaming (careful with client expectations)

A CSV streaming example outline:

  • Write headers
  • Stream rows in batches
  • Flush periodically

This keeps memory stable because you never hold the full dataset in memory.

5) Bound your caches (don’t “accidentally DOS yourself”)

Unbounded in-memory caching is a classic cause of memory growth.

If you use IMemoryCache, configure size limits and set size per entry:

  • Enable SizeLimit
  • Every entry must call SetSize(…)
  • Use absolute expiration

Why this saves memory: the cache becomes self-limiting instead of growing with traffic patterns.

6) Avoid request/response body buffering unless required

Middleware or logging that reads the body often forces buffering. Buffering large payloads multiplies memory use during concurrency.

Guidance:

  • Don’t enable request buffering globally
  • Don’t log entire bodies in production
  • Stream uploads/downloads

7) Logging: reduce hidden allocations

High-volume logging allocates:

  • formatted strings
  • structured state objects
  • scope dictionaries

Prefer structured logs:

  • Good: logger.LogInformation(“Fetched {Count} orders for {CustomerId}”, count, customerId);
  • Avoid: interpolated strings in hot paths, or logging huge serialized objects

Putting it together: an “after” endpoint

Here’s a more memory-friendly version of the original endpoint:

[HttpGet("search")]
public async Task<ActionResult<PagedResult<OrderSummaryDto>>> Search(
Guid customerId,
int page = 1,
int pageSize = 50)
{
page = Math.Max(page, 1);
pageSize = Math.Clamp(pageSize, 1, 200);

var baseQuery = _db.Orders
.AsNoTracking()
.Where(o => o.CustomerId == customerId);

var total = await baseQuery.CountAsync();

var data = await baseQuery
.OrderByDescending(o => o.CreatedAt)
.Skip((page - 1) * pageSize)
.Take(pageSize)
.Select(o => new OrderSummaryDto
{
Id = o.Id,
CreatedAt = o.CreatedAt,
Total = o.Total,
ItemCount = o.Items.Count
})
.ToListAsync();

return Ok(new PagedResult<OrderSummaryDto>(data, total, page, pageSize));
}

public record OrderSummaryDto(Guid Id, DateTime CreatedAt, decimal Total, int ItemCount);

public record PagedResult<T>(IReadOnlyList<T> Data, int Total, int Page, int PageSize);

This version:

  • Caps payload size
  • Avoids tracking
  • Avoids loading large graphs
  • Allocates fewer objects per request

A quick checklist you can apply to most APIs

  • Pagination everywhere for list endpoints
  • AsNoTracking() for reads
  • Projection (Select) instead of Include + mapping
  • Streaming for exports and large payloads
  • Bounded caching (size + expiration)
  • No global body buffering
  • Structured logging with controlled volume

How to validate improvements (what to measure)

To prove memory optimizations, track:

  • Allocated bytes/sec
  • Gen 0/1/2 GC counts
  • P95/P99 latency
  • Working set / private bytes
  • Request rate under load

In practice, the biggest sign you’re winning is: lower allocations per request, fewer Gen2 collections, and stable memory during bursts.

Hope you like the article. Happy Programming.

No comments:

Post a Comment