The Problem with .Count()

On my spare time I’m working on a personal project mainly focused on consuming feeds and serving the aggregated content. feed-io is of course at the center of this, with peripheral apps like an HTTP API written in Golang and a MongoDB database for storage. All collections are of course indexed according to the queries so it was fine. Nevertheless, the request that serves the home page (newest items from all feeds) is getting slower and slower as time flies by. At some point, it took more than a second for 369K items, totally unaceptable.

So before digging in further, let’s describe a bit the web server. WHen receiving an HTTP Request, it does two very simple MongoDB queries, here’s the simplified version below:

fetch the fresh stuff

1
2
// without any input param from the HTTP request, query is just an empty bson.M
db.items.find(query).sort({"lastModified": -1})

count items for the query above

1
db.items.count(query)

Surprisingly, the same code gets fast when I specify filters through the HTTP API, like {language: 'en'}.

So … What’s going on ?

First thing to do is to profile slow queries. Rather easy with MongoDB:

1
db.setProfilingLevel(1);

With level set to 1, it will log only slow queries (if it’s 2, Mongo logs everything). Now, let’s hit the HTTP server once and see what’s in the system.profile collection.

1
db.system.profile.findOne()

Here’s the result:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
{
"op" : "command",
"ns" : "feedio_api.items",
"command" : {
"aggregate" : "items",
"pipeline" : [
{
"$match" : {

}
},
{
"$group" : {
"_id" : 1,
"n" : {
"$sum" : 1
}
}
}
],
"cursor" : {

},
"lsid" : {
"id" : UUID("ee34ace4-b139-4236-bf20-9b8478f11ad7")
},
"$db" : "feedio_api"
},
"keysExamined" : 0,
"docsExamined" : 368233,
"cursorExhausted" : true,
"numYield" : 368,
"nreturned" : 1,
"locks" : {
// removed, not interesting here
},
"flowControl" : {

},
"storage" : {
"data" : {
"bytesRead" : NumberLong(134366668),
"timeReadingMicros" : NumberLong(310468)
}
},
"responseLength" : 129,
"protocol" : "op_msg",
"millis" : 554,
"planSummary" : "COLLSCAN"
}

The planSummary says “COLLSCAN”, so it can only get worse over time and let’s have a look at the command … wait, what ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"aggregate" : "items",
"pipeline" : [
{
"$match" : {

}
},
{
"$group" : {
"_id" : 1,
"n" : {
"$sum" : 1
}
}
}
],

So … why ? I didn’t asked for that, I wanted MongoDB to count() not to actually count! I mean, I want Mongo to use its count() function, not to aggregate all documents to count them. So let’s have a look at the code:

1
2
3
4
5
6
7
func (r *ItemsRepository) Count(ctx context.Context, query database.SearchQuery) (int64, error) {
count, err := r.Database.Collection("items").CountDocuments(ctx, r.toBson(ctx, query))
if err != nil {
return 0, err
}
return count, nil
}

So far, so good: there’s a query that is converted to bson format before being sent to what I though would be a call to count(). And then the driver decides to aggregate documents, did I found a bug here ? Maybe it’s worth raising an issue on the official repository, or maybe someone else already did … I was thinking of that when something in my brain pulled the emergency brake, brutally stopping my train of thoughts: how could it be a bug ? it’s not a random behavior nor a typo in the request, it must have been implemented willingly. The best way is to check the code, more precisely its comment:

1
2
3
4
5
6
// CountDocuments returns the number of documents in the collection. For a fast count of the documents in the
// collection, see the EstimatedDocumentCount method.
//
// The filter parameter must be a document and can be used to select which documents contribute to the count. It
// cannot be nil. An empty document (e.g. bson.D{}) should be used to count all documents in the collection. This will
// result in a full collection scan.

Let’s quote the most interesting parts:

For a fast count of the documents in the collection, see the EstimatedDocumentCount method

OK, will do that

An empty document (e.g. bson.D{}) should be used to count all documents in the collection. This will result in a full collection scan.

That explains a lot!

So now, let’s just use EstimatedDocumentCount when the filter is empty:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func (r *ItemsRepository) Count(ctx context.Context, query database.SearchQuery) (int64, error) {
bsonQuery := r.toBson(ctx, query)
if len(bsonQuery) == 0 {
return r.CountAll(ctx)
}
count, err := r.Database.Collection("items").CountDocuments(ctx, bsonQuery)
if err != nil {
return 0, err
}
return count, nil
}

func (r *ItemsRepository) CountAll(ctx context.Context) (int64, error) {
count, err := r.Database.Collection("items").EstimatedDocumentCount(ctx)
if err != nil {
return 0, err
}
return count, nil
}

Now the result:

1
Time: 48ms

🎉 🎉 🎉 🎉 🎉 🎉 🎉

OK, now the issue is solved but what’s the explanation behind all this ? Let’s dig a bit in Mongo’s official documentation, especially in that chapter called “Accuracy and Sharded Clusters”:

On a sharded cluster, the count command when run without a query predicate can result in an inaccurate count if orphaned documents exist or if a chunk migration is in progress.
To avoid these situations, on a sharded cluster, use the db.collection.aggregate() method

And so did the developers of the Golang driver, they just did the right thing according to the manual. However, it shows that it’s not always running the query we expect when writing the code. So we always need to be careful and think about what will be actually sent to the server (always true when querying a database BTW). And read the manual, it’s the smarter way to use your smartphone on the toilet.