ZAPI-747: Support updating VMAPI's moray buckets' indexes at VMAPI's startup

Resolution

Fixed: A fix for this issue is checked into the tree and tested.
(Resolution Date: 2017-06-09T23:27:57.000Z)

Fix Versions

2017-06-22 TURKISH DECOY (Release Date: 2017-06-22)

Related Issues

Related Links

Description

To paraphrase TOOLS-1510: currently, when an index needs to be added to VMAPI's moray indexes, sdcadm's "update-other" procedure needs to be updated to call a new specific hardcoded script depending on the version of the VMAPI image to which the VMAPI zone is upgraded.

This creates a hard dependency between sdcadm and VMAPI and makes adding any index to VMAPI's moray buckets tedious. I need to add a new index on a new property to VMAPI's moray bucket as part of VOLAPI-6, and I'd like to make that process a bit better.

Moreover, as @josh.clulow pointed out in TOOLS-1510, adding indexes from "sdcadm up vmapi" is not robust, since the VMAPI update procedure is considered to be complete as soon as the VMAPI zone is provisioned with the desired image. As a consequence, if the reprovision step of the update succeeded, but the addition of indexes failed, a subsequent run of VMAPI's update procedure would consider that there's nothing to udpate, and the indexes wouldn't be properly setup.

So it seems that the only solution to support adding indexes for new object properties (which don't require reindexing) to VMAPI in a way that allows operators to detect and act on failures is to perform that step at VMAPI's startup, and to have VMAPI respond with an HTTP error code to any ping request if the addition of indexes failed.

This way, because health checking VMAPI sends a request to its ping endpoint, operators can detect that the update failed, and can take a look at VMAPI's logs to determine that the failure was in updating indexes.

Comments

Comment by Joshua M. Clulow
Created at 2016-08-11T07:57:54.000Z
In general, this sounds good to me. A few things that would be good to note:


Comment by Julien Gilli [X]
Created at 2016-08-11T16:33:42.000Z
Updated at 2016-08-11T16:34:13.000Z
It'd be good to make sure that whichever log file the bucket configuration failure is checked by AMON to generate appropriate alarms

Sounds good.

We should take this opportunity to move from a "read-modify-write" update of the bucket configuration to using Moray support for bucket versions

We already had a discussion about adding bucket versioning in VMAPI where you described what would be needed in other parts of Triton to support it. I would really like to avoid introducing it as part of this work, as it seems it seems it would significantly broaden its scope.

It seems indexes on new properties can be added safely with regards to rollbacks and upgrades without bucket versioning. Do you have a use case in mind for such indexes where bucket versioning would be required? If not I would like to consider that for future work.

We should check to make sure that, even though it's the addition of a new property, Moray does not (for whatever reason) require us to call the reindexing endpoint anyway in order to activate the new index anyway; this might be a question for Patrick Mooney

After looking at the code, at tables' schemas after an index was added without calling the reindexing endpoint, and given that searches using these new indexes seem to work as expected, I don't think Moray does require us to call the reindexing endpoint. But I might be missing something.

If the bucket update step fails, we should almost certainly back off and then keep trying periodically, rather than just coming to rest in a broken state – if not for all failures, at least for transient failures

Absolutely, that's what I had started doing. What would be the set of transient errors for which we'd like to retry? Would any error that is not listed in the set of errors passed to updateBucket's callback be a good candidate?

Comment by Julien Gilli [X]
Created at 2016-08-12T16:36:20.000Z
Updated at 2016-08-12T16:36:44.000Z
It'd be good to make sure that whichever log file the bucket configuration failure is checked by AMON to generate appropriate alarms

@trent.mick mentioned in private to me that AMON monitoring is not used in Triton (but it is in Manta). So it seems it should be out of scope for this ticket. @trent.mick Can you confirm? @bbennett What would be the best way for us to allow ops to be notified when an update for VMAPI's moray buckets fails?

Comment by Julien Gilli [X]
Created at 2016-10-08T00:45:27.000Z
Updated at 2016-10-08T00:45:36.000Z
Created cr.joyent.us/#/c/629 that implements what's been discussed so far. A bit of context around the changes in that CR:


Comment by Julien Gilli [X]
Created at 2016-11-08T17:47:12.000Z
Updated at 2016-11-08T17:49:03.000Z

Current state of progress

The goal of ZAPI-747 is to allow developers to add indexes to VMAPI's moray
buckets in a way that keeps the system in a consistent state at all times.

Changes in indexes configuration that are not supported should generate errors
and not let the system perform operations that would break the integrity of the
data stored in moray.

Changes to indexes configuration that are supported should leave the system
operating consistently at all times and should not require VMAPI to be down for
longer than the migration window currently allocated for the migration process
of a Triton setup.

Unfortunately, both the current state of VMAPI's data retention policy and some
limitations in the way moray handles buckets being reindexed make the latter
requirement impossible to meet without changes to at least either one of these
systems.

This document first goes into some details to present the problem and then
describes potential solutions that can be implemented in Moray and/or VMAPI.

Need for reindexing

Filters returning erroneous results silently before reindexing is complete

After adding an index in a moray bucket on a given field, and before reindexing
is complete, findobjects requests that use that field as part of a composite
filter (an and or or filter) will silently return bogus results.

There are two use cases for which findobjects requests can silently return
erroneous results when a bucket is being reindexed: when searching through
values that were added before a new index was added, and after a new index
was added.

For both use cases, to get the expected results from the findobjects requests,
users have to reindex the bucket and wait for all entries to be reindexed.

Values of properties added before an index is added

For instance, with the following initial bucket configuration:

{
    index: {
        str_field: {
            type: 'string'
        }
    },
    options: {
        version: 1
    }
}

If the following objects are added to the bucket:

{
    str_field: 'foo'
    boolean_field: true
}

and:

{
    str_field: 'foo',
    boolean_field: false
}

and then the bucket is updated to have the following configuration:

{
    index: {
        str_field: {
            type: 'string'
        }
        boolean_field: {
            type: 'boolean'
        }
    },
    options: {
        version: 2
    }
}

searching for objects in this bucket with the filter
(&(str_field=foo)(boolean_field=false)) will return both objects.

The reason this findobjects request doesn't return the only object that
matches the filter is that, when the database's table's column that is storing
the values for the newly indexed property does not contain any value for that
property, the values on which the filter is applied have that property deleted
(https://github.com/joyent/moray/blob/master/lib/objects/common.js#L843-L857).

Thus,
database|https://github.com/joyent/moray/blob/master/lib/objects/find.js#L147]
does not filter on the boolean_field property, and the objects that do not
match the filter for that field pass through.

Values of properties added after an index is added

For instance, with the following initial bucket configuration:

{
    index: {
        str_field: {
            type: 'string'
        }
    },
    options: {
        version: 1
    }
}

If the bucket is updated to have the following configuration:

{
    index: {
        str_field: {
            type: 'string'
        },
        boolean_field: {
            type: 'boolean'
        }
    },
    options: {
        version: 2
    }
}

and then following object is added to the bucket:

{
    str_field: 'foo',
    boolean_field: true
}

searching for objects in this bucket with the filter
(&(str_field=foo)(boolean_field=true)) will not return any result.

The reason is that the filter used to make sure that all objects returned
actually match the provided filter use a filter that is not aware of the indexed
fields' type for all fields that are not completely reindexed.

The
jects/common.js#L126-L304] is the one responsible for
values specified in the findobjects request's filter|https://github.com/joyen
t/moray/blob/master/lib/objects/common.js#L44-L123].

However, the compileQuery function
reindexed as valid|https://github.com/joyent/moray/blob/master/lib/objects/comm
on.js#L481-L500], and thus will update the types of filters' values only for
fields that correspond to fully reindexed indexes.

The consequence is that for the following object:

{
    str_field: 'foo',
    boolean_field: true
}

the filter (&(str_field=foo)(boolean_field=true)) is able to match a value
'foo' (a string) for its property str_field, but cannot match a value
'true' (a string, when it should be a boolean true) for its property
boolean_field.

This problem only applies to indexes that have a non-string type. Filtering on
new indexes of type 'string' filters results as expected.

Related problem with non-indexed fields

Note that the same problem with filtering on non-string values exists with
fields that are not indexed (whether they are being reindexed or not).

This is already documented in the moray-test-suite repository.
I have not yet been able to find an existing JIRA ticket that describes this
problem.

The problem with non-indexed fields can be considered to be separate because it
is not directly related to ZAPI-747, and thus it will not be mentioned in the
rest of this document.

Problems with waiting for reindexing to be done

Waiting for moray buckets to be completely reindexed before being able to run
findobjects requests is not practical because, for various reasons, the
reindexing process can potentially take a long time for VMAPI's buckets. The
reindexing process could, in the future, exceed the migration window allocated
for VMAPI.

The main reason is that the reindexing process' duration is inherently
associated with the number of objects in the bucket being reindexed. The more
objects there are in a given moray buckets, the more time it takes. A system
that grows in terms of usage will have to store more objects and the reindexing
process for its moray buckets will take more time.

The second reason is that VMAPI has a data retention policy that keeps all
objects in its moray buckets, so the number of objects grows significantly over
time. With the rise of docker usage, and potentially more short lived Docker
containers being created over time, that growth might accelerate.

Finally, this lack of efficient data retention policy might have an even bigger
impact in the future depending on how different data centers scale in terms of
VM objects created.

Time to reindex grows with the number of objects

The following table describes the time it takes to reindex a given number of
rows after adding one index of type 'string' on an actual hardware setup, in
the "nightly-2" datacenter:

| number of rows | reindexing time |
| 100000 | 2.5 minutes |
| 200000 | 5 minutes |
| 300000 | 8.2 minutes |
| 400000 | 12 minutes |
| 500000 | 17 minutes |
| 600000 | 23 minutes |
| 700000 | 29 minutes |
| 800000 | 40 minutes |
| 900000 | 47 minutes |
| 1000000 | 59 minutes |

We can see from this table that the time it takes to reindex a bucket seems to
grow faster than the number of objects.

These measurements were performed by running the index.js program available in
the
repository|https://github.com/misterdjules/moray-reindex-benchmark] from the
sdc-docker core service zone in the nightly-2 datacenter.

The buckets did not contain any data other than the field added and reindexed.

The number of objects reindexed by reindexObjects request was 100.

Another factor that goes into the time it takes for a full bucket reindex to
complete is the number of fields that need to be reindexed. The more fields that
need to be reindexed, the longer it takes for the operation to complete. A
typical migration would add only a few new indexes at most, but it shows that
the time it takes for a reindex operation to complete can vary significantly
depending on code changes and the data stored. Thus it is difficult to predict
how long that process will take for any given service at any given time, or even
to determine an upper bound.

Data retention policy in VMAPI doesn't include scrubbing of old objects

VMAPI, with about 400K objects in us-east1 for its vmapi_vms bucket, is a good
example of a service using moray that might not be able to wait for reindexing
to be done before a migration can be considered complete.

The current numbers of VM objects stored in VMAPI's vmapi_vms moray bucket in
each datacenter is following:

| DC | all VMs | active (non-destroyed & non-failed) |
| us-east-1| 416659 | 4453 |
| ams1| 183051 | 1360 |
| sw-1 | 161631 | 3075 |
| us-west-1 | 139873 | 2456 |
| us-east-2 | 109559 | 1045 |
| us-east-3 | 104613 | 783 |
| us-east-3b | 62865 | 444 |

The number of all VMs is growing constantly in most DCs because there is
currently no scrubbing of destroyed VMs. As a result, even if the current amount
of data and the typical changes made when adding indexes would make a full
reindex operation last less then the maintenance window, it might be only a
matter of time before this becomes a problem.

If we look at the number of non-destroyed and non-failed VMs, we can see they
are much lower. Working with that order of magnitude could make the requirement
of reindexing to be complete before using a moray bucket acceptable.

Growth targets for Triton cloud

Regardless of VMAPI's data retention policy being the main cause of time spent
reindexing its moray buckets, it is possible that, as usage of a given Triton
data center grows and more objects are stored in moray buckets, the time it
takes for a reindexObjects operation to complete increases and can become
unacceptable for some services, even if they all implement an efficient data
retention policy.

Potential solutions

Several potential solutions are described in this section. They are ordered by
implementation complexity, from the least complex to the most complex. These
solutions are not necessarily exclusive.

Making findobjects requests using filters on reindexing fields return an error

A simple approach is to generate an error on any findobjects request that uses
filters including a reindexing field. Services using moray can then handle these
explicit errors as any other operation errors.

Limitations

The main limitation of this approach is that some requests will always result in
an error for the duration of the reindexing process. As we described before, it
currently means a downtime of around 15 minutes, growing every day.

It is likely to not be significant for VMAPI now, but we would need to make sure
that findobjects requests using newly added indexes are not used in the future
to implement status endpoints, which determine when the service can be
considered to be functional. Otherwise, this would effectively mean that the
reindexing process would need to be complete before considering the service
available, which is the problem that we're trying to solve.

Making searches using filters on reindexing fields return correct results

One potential solution would be to make findobjects requests return correct
results when using filters that include reindexing fields.

This is possible because we can get access to the type of that indexed field,
since it's added to the index column of the buckets_config table when the
bucket is updated, not once the bucket is done reindexing.

It requires that the cache of the bucket configuration for the moray instance
that is handling a given findobjects request not be stalled. Currently, the
bucket configurations cache is refreshed:

1. every 5 minutes on any moray instance

2. when an object is written to a bucket, only on the moray instance handling
the putobject request

3. when the bucket is updated, only on the moray instance handling the
updatebucket request

This is not sufficient, as a findobjects request on a reindexing bucket can be
sent to a moray instance that didn't handle the latest updatebucket request
and that hasn't handled any putobject request since the bucket was last
updated.

This can be solved by sending, along with the findobjects request, a number
that represents the required bucket version number. If the required bucket
version number is higher than the current cached bucket's version number, then
the moray instance handling the findobjects request refreshes it cache.

A branch of moray that implements these changes is available at
https://github.com/misterdjules/moray/commits/fix-filters-when-reindexing, along
with th
b.com/misterdjules/moray-test-suite/tree/test-fix-filters-when-reindexing].

Performance impact

With these changes, findobjects requests will perform more operations at the
JavaScript layer, such as updating the filter's values' type, to make sure that
the filtering process works correctly when not using the underlying database.

It's important to determine the performance impact of this change because it
could mean spikes in latency for moray requests for the duration of the
reindexing process.

I have tried to measure that impact by running
findobjects
requests|https://github.com/misterdjules/moray-benchmark-search-filters] with
and without the changes described above.

The two use cases that were benchmarked are the following:

1. findobjects requests using and filters that contain one indexed field and
one field that is reindexing.

2. findobjects requests using and filters that contain two indexed fields that
are both fully reindexed.

For both use cases, different moray buckets were used so that the buckets and
objects creation could be done once and the given benchmark could be run any
number of times.

Different DTrace script were also used to determine the performance of
findobjects requests.

Performance impact for findobjects requests with filter using one non-reindexed field
The following DTrace script:

$ cat /var/tmp/find-not-yet-reindexed-latency.d
#pragma D option quiet

moray*:::findobjects-start
/copyinstr(arg3)=="(&(uuid=*)(not_yet_reindexed_string=sentinel))" ||
        copyinstr(arg3)=="(&(uuid=*)(not_yet_reindexed_boolean=true))" ||
  copyinstr(arg3)=="(&(uuid=*)(not_yet_reindexed_number=42))"/
{
        latency[arg0] = timestamp;
        bucket[arg0] = copyinstr(arg2);
        filter[arg0] = copyinstr(arg3);
}

moray*:::findobjects-done
/latency[arg0]/
{
  @latencies_avg[strjoin(bucket[arg0], "_avg"), filter[arg0], arg1] =
    avg(((timestamp - latency[arg0]) / 1000000));

  @latencies_min[strjoin(bucket[arg0], "_min"), filter[arg0], arg1] =
    min(((timestamp - latency[arg0]) / 1000000));

  @latencies_max[strjoin(bucket[arg0], "_max"), filter[arg0], arg1] =
    max(((timestamp - latency[arg0]) / 1000000));

  @latencies_distribution[bucket[arg0], filter[arg0], arg1] =
                quantize(((timestamp - latency[arg0]) / 1000000));

  latency[arg0] = 0;
        bucket[arg0] = 0;
        filter[arg0] = 0;
}


was used to trace the performance of filtering through 10K objects, half of
which matched filters using one fully indexed string property and one
non-reindexed non-string property.

The distribution of latencies with the current version of moray is:

moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_number=42))                          0              208
moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_boolean=true))                       0              212
moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              466
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_number=42))                          0              183
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_boolean=true))                       0              184
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              375
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_number=42))                          0              369
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_boolean=true))                       0              507
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000             1078
moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_boolean=true))                       0
        value  ------------- Distribution ------------- count    
         64 |                                         0        
        128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   94       
        256 |@@                                       6        
        512 |                                         0        

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_number=42))                          0
        value  ------------- Distribution ------------- count    
         64 |                                         0        
        128 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    93       
        256 |@@@                                      7        
        512 |                                         0        

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       86       
         512 |@@@@@                                    13       
        1024 |                                         1        
        2048 |                                         0 
[root@d5d2c71c-10d3-4766-a7d9-1db0eda12b3d (coal:moray0) ~]#

The distribution of latencies with moray running with the changes at
https://github.com/misterdjules/moray/tree/fix-filters-when-reindexing is:

moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_number=42))                       5000              454
moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_boolean=true))                    5000              456
moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              474
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_boolean=true))                    5000              357
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_number=42))                       5000              366
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              375
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_number=42))                       5000              844
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              959
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_boolean=true))                    5000             1113
moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_boolean=true))                    5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   95       
         512 |@                                        3        
        1024 |@                                        2        
        2048 |                                         0        

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_number=42))                       5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@     91       
         512 |@@@@                                     9        
        1024 |                                         0        

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      87       
         512 |@@@@@                                    13       
        1024 |                                         0      

It's important to note that with the current moray version, filters using
non-string non-reindexed fields either do not return any objects or do return
all objects. For our benchmark, they return no object because we use values for
the reindexed fields that are truthy (42, true and 'sentinel');

So the only performance we can compare is the findobjects requests that filter
on a non-reindexed field of type 'string'.

Without changes, we have:

moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              466
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              375
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000             1078

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@       86       
         512 |@@@@@                                    13       
        1024 |                                         1        
        2048 |                                         0 
[root@d5d2c71c-10d3-4766-a7d9-1db0eda12b3d (coal:moray0) ~]#

and with changes:

moray_benchmark_unindexed_avg                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              474
moray_benchmark_unindexed_min                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              375
moray_benchmark_unindexed_max                       (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000              959

moray_benchmark_unindexed                           (&(uuid=*)(not_yet_reindexed_string=sentinel))                 5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      87       
         512 |@@@@@                                    13       
        1024 |                                         0      

which look like they have very similar profiles.


Performance impact for findobjects requests with filter using only fully reindexed fields
The following DTrace script:

$ cat /var/tmp/find-reindexed-latency.d
#!/usr/sbin/dtrace -s

#pragma D option quiet

moray*:::findobjects-start
/copyinstr(arg3)=="(&(uuid=*)(reindexed_string=sentinel))" ||
        copyinstr(arg3)=="(&(uuid=*)(reindexed_boolean=true))" ||
  copyinstr(arg3)=="(&(uuid=*)(reindexed_number=42))"/
{
        latency[arg0] = timestamp;
        bucket[arg0] = copyinstr(arg2);
        filter[arg0] = copyinstr(arg3);
}

moray*:::findobjects-done
/latency[arg0]/
{
  @latencies_avg[strjoin(bucket[arg0], "_avg"), filter[arg0], arg1] =
    avg(((timestamp - latency[arg0]) / 1000000));

  @latencies_min[strjoin(bucket[arg0], "_min"), filter[arg0], arg1] =
    min(((timestamp - latency[arg0]) / 1000000));

  @latencies_max[strjoin(bucket[arg0], "_max"), filter[arg0], arg1] =
    max(((timestamp - latency[arg0]) / 1000000));

  @latencies_distribution[bucket[arg0], filter[arg0], arg1] =
                quantize(((timestamp - latency[arg0]) / 1000000));

  latency[arg0] = 0;
        bucket[arg0] = 0;
        filter[arg0] = 0;
}
[root@d5d2c71c-10d3-4766-a7d9-1db0eda12b3d (coal:moray0) ~]#

was used to trace the performance of filtering through 10K objects, half of
which matched filters using two fully indexed properties: one of type 'string'
and the other of type 'string', 'number' or 'boolean'.

The distribution of latencies with the current version of moray is:

moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_boolean=true))                            5000              364
moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_string=sentinel))                         5000              375
moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_number=42))                               5000              376
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_number=42))                               5000              287
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_boolean=true))                            5000              295
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_string=sentinel))                         5000              299
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_boolean=true))                            5000              548
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_string=sentinel))                         5000              562
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_number=42))                               5000             1340
moray_benchmark_reindexed                           (&(uuid=*)(reindexed_boolean=true))                            5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  98       
         512 |@                                        2        
        1024 |                                         0        

moray_benchmark_reindexed                           (&(uuid=*)(reindexed_string=sentinel))                         5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  98       
         512 |@                                        2        
        1024 |                                         0        

moray_benchmark_reindexed                           (&(uuid=*)(reindexed_number=42))                               5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   96       
         512 |@                                        3        
        1024 |                                         1        
        2048 |                                         0        

and the distribution of latencies with the changes at
https://github.com/misterdjules/moray/tree/fix-filters-when-reindexing is:

moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_number=42))                               5000              371
moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_boolean=true))                            5000              372
moray_benchmark_reindexed_avg                       (&(uuid=*)(reindexed_string=sentinel))                         5000              377
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_number=42))                               5000              281
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_boolean=true))                            5000              307
moray_benchmark_reindexed_min                       (&(uuid=*)(reindexed_string=sentinel))                         5000              307
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_boolean=true))                            5000              645
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_string=sentinel))                         5000              831
moray_benchmark_reindexed_max                       (&(uuid=*)(reindexed_number=42))                               5000              966
moray_benchmark_reindexed                           (&(uuid=*)(reindexed_number=42))                               5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  98       
         512 |@                                        2        
        1024 |                                         0        

moray_benchmark_reindexed                           (&(uuid=*)(reindexed_string=sentinel))                         5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  97       
         512 |@                                        3        
        1024 |                                         0        

moray_benchmark_reindexed                           (&(uuid=*)(reindexed_boolean=true))                            5000
        value  ------------- Distribution ------------- count    
         128 |                                         0        
         256 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   95       
         512 |@@                                       5        
        1024 |                                         0  

Again, distribution profiles look very similar.

Limitations

Bogus count property in search results' objects
The _count field for each object returned by a findobjects operation is
computed by a SQL query that uses a WHERE clause that includes a given field
of a findobjects filter only when that field is reindexed.

Thus, if the bucket is still in the process of being reindexed, the part of the
filter that uses fields that are being reindexed will be applied not in the
WHERE clause of the SQL query, but later in the process.

Updating the value of the _count property based on what objects match the
filter applied after the SQL query ran would require to process all objects
first and count them, and then return them. This seems like it would not be
desirable.

Currently, at least the following services rely on the _count property of
objects returned by findobjects:

  • VMAPI, to set the x-resource-count response header in responses to queries
    on the ListVms endpoint.
  • PAPI, to set the x-resource-count response header in responses to queries on
    the ListPkgs endpoint.
  • adminui, when listing users.

One way to mitigate that would be for findobjects requests to generate an
error when filters use non-reindexed fields and the no_count option is not
used. For VMAPI, it would mean that e.g ListVms requests that filter on newly
indexed_ fields would return an error for the duration of the reindexing
process. Other ListVms requests that used to have the correct
x-resource-count response header would still have the same result.

Scrubbing old objects from VMAPI's moray buckets

Scrubbing VMAPI's moray buckets, especially vmapi_vms, seems like it could
significantly bring down the number of objects to reindex. It seems reasonable
that we could be able to bring that number down to ~100K objects, which requires
around 10 minutes to reindex.

Scrubbed objects would need to be stored in a durable storage service where that
data can be easily accessed by operators and developers, for debugging,
reporting, billing and other purposes.

It is a complex task because it is potentially a backward incompatible change to
the interface that people use to access data that they can currently access
using CloudAPI or VMAPI, such as inactive VMs.

Choosing a way forward for ZAPI-747

My recommendation is currently to implement the changes needed to make
findobjects request return correct results when using filters that include
reindexing fields.

The data I've been able to gather so far indicates that the performance impact
is negligible, and it seems to be the only solution that works as expected and
without potentially surprising caveats for any type of index changes and any
number of objects.

In order to make the limitation of the _count attribute of objects sent as a
response of such findobjects requests, I recommend to make these requests
generate an error if the no_count option is not passed.

With these changes applied to joyent/moray, the code that initializes the
storage layer for VMAPI in ZAPI-747 will be changed so that:

1. the storage layer is considered to be initialized as soon as all VMAPI moray
buckets have been updated, but not yet reindexed.
2. after all VMAPI moray buckets have been updated, they will be reindexed
concurrently, in the background (that is, the reindexing process will not
prevent the any VMAPI operation). If reindexing objects for any of VMAPI's
moray buckets results in a non-transient error, an error will be thrown and
VMAPI will abort.

I also recommend to:

1. plan for writing a RFD investigating how a more efficient data retention
policy can be implemented for VMAPI.

2. file a ticket for investigating/implementing how we can make findobjects
requests using filters on unindexed fields (for which no type info is
available) return an error.

Comment by Julien Gilli [X]
Created at 2016-11-10T22:38:27.000Z
@dap mentioned MORAY-104 in a private email and then @trent.mick brought it again (with MANTA-893) to my attention and rightfully pointed out that filtering on reindexing objects, even if we fix the values' type problem as suggested in my previous comment, suffers from problems when dealing with paginating results.

We were unable to find a solution to that paginating issue that didn't bring significant new problems and so it seems the only way to have findobjects requests return consistent results when a moray bucket is being reindexed is to add the ability to specify that all indexes for a given bucket at a given version are reindexed (i.e, no field for that version is present in the reindex_active column of the buckets_config table).

This should probably be designed and implemented in conjunction with MORAY-104, and probably requires going through the RFD process.

To support the current use cases needed by VOLAPI, we could support adding indexes only with type 'string' (to avoid the issue with the filter instances having their value of type 'string' regardless of the actual index type) and on new values (to avoid having parts of the filter using the new index being dropped while the bucket is reindexing).

The caveats in this situation are that, while reindexing, pagination would be broken in the case of a findobjects request using a filter that include an index that is not yet reindexed. Given that:

1. the default limit for moray findobjects requests is currently 1000
2. there are ~5000 active VMs at most in the largest Triton public cloud DC
3. most queries from VOLAPI to VMAPI using filters on new indexes would also filter on active VMs
4. the reindexing process would likely take less than an hour to complete

it seems it could be an acceptable trade-off: a user would need to have more than 1000 active VMs before reindexing is complete for it to become a problem. It also seems that it's introducing a lot of subtle limitations when we might be able to get to an actual solution that wouldn't require them.



Comment by Julien Gilli [X]
Created at 2016-11-10T22:40:34.000Z
There's not enough details in MANTA-104 to determine precisely how it's related to ZAPI-747, but they are definitely related if indexed but not fully reindexed fields are considered to be covered by that ticket.

Comment by Julien Gilli [X]
Created at 2017-05-19T23:03:53.000Z
Updated the CR at https://cr.joyent.us/#/c/629.

Comment by Julien Gilli [X]
Created at 2017-06-08T22:45:37.000Z
In addition to the tests that are included with the CR, I went through the following in my COAL:

1. Updated VMAPI to use the changes in the CR above.
2. Added an index on a field 'bar' with the type 'string' in the configuration of the vmapi_vms bucket
3. Added 600K objects in moray with a value 'foo' for their property 'bar'
4. Restarted VMAPI
5. Made sure that, after VMAPI started and before the whole moray initialization process was completed, the /ping endpoint responded with a status of 'OK' but with a healthy of false, and an initialization.moray.status of 'BUCKETS_SETUP_DONE' (but not 'BUCKETS_REINDEX_DONE'
6. After a couple hours, the reindexing process was done, and I verified that the /ping endpoint responded with a status of 'OK', healthy === true, and an initialization.moray.status of 'BUCKETS_REINDEX_DONE'


Comment by Julien Gilli [X]
Created at 2017-06-08T23:08:49.000Z
I've also updated the VMAPI core zone in nightly-1 to a build generated from the changes in the CR mentioned above, and ran the full VMAPI tests suite there. The upgrade went well and all tests passed.

Comment by Julien Gilli [X]
Created at 2017-06-08T23:20:48.000Z
Updated at 2017-06-09T23:22:11.000Z
Finally, still in nightly-1, I rolled back VMAPI to latest master, verified the roll back went well and re-ran the full VMAPI tests suite, which passed. Then I re-upgraded VMAPI to an image built from the changes that fix this ticket, and I verified that the service eventually became healthy and that the moray buckets initialization process behaved as expected.

Comment by Bot Bot [X]
Created at 2017-06-09T23:27:37.000Z

sdc-vmapi commit a4f4c8f (branch master, by Julien Gilli)

ZAPI-747 Support updating VMAPI's moray buckets' indexes at VMAPI's startup
    Reviewed by: Trent Mick <trent.mick@joyent.com>
    Approved by: Trent Mick <trent.mick@joyent.com>