Definition and concepts you must know before troubleshooting AHM lagging issue:
Ancient History Mark -AHM:
The ancient history mark – AHM is the epoch prior to which historical data can be purged from physical storage.
Epoch:
An epoch is 64-bit number that represents a logical time stamp for the data in Vertica. Every row has an implicitly stored column that records the committed epoch.
Using below query, you can check which time periods pertain to which epochs:
oradmin=> SELECT * FROM epochs;
Current Epoch (CE):
The current epoch is the open epoch that becomes the last epoch (LE) after a COMMIT operation. The current_epoch at the time of the COMMIT is the epoch for that DML.
oradmin=> SELECT CURRENT_EPOCH FROM SYSTEM;
CURRENT_EPOCH
---------------
629415
Latest Epoch (LE):
The latest epoch is the most recently closed epoch. The current epoch after the COMMIT operation becomes the latest epoch.
Checkpoint Epoch (CPE):
Last Good Epoch (LGE):
To see the cluster last good epoch, you can use the following command:
oradmin=> SELECT GET_LAST_GOOD_EPOCH();
Last Good Epoch Does Not Advance
oradmin=>SELECT CURRENT_EPOCH, LAST_GOOD_EPOCH FROM SYSTEM ;
CURRENT_EPOCH | LAST_GOOD_EPOCH
---------------+-----------------
731384 | 721381
If you do not see the LGE advance, check if there is data in the WOS:
oradmin=>SELECT sum(wos_used_bytes) from projection_storage ;
If there is data in the WOS, force a moveout operation:
oradmin=> SELECT do_tm_task('moveout');
Ancient History Mark Does Not Advance
If the difference b/w the Last good epoch(LGE) and Ancient History mark(AHM) is huge then you need to ensure that there is not much difference as it takes some hours to recover the data.
You can check the LGE and AHM difference using below query:
oradmin=> select get_current_epoch(),get_last_good_epoch(),get_ahm_epoch(),(get_current_epoch()- get_last_good_epoch()) LGECEDiff,(get_last_good_epoch()-get_ahm_epoch()) LGEAHMDiff, get_ahm_time();
oradmin=> select get_current_epoch() CE,get_last_good_epoch() LGE,get_ahm_epoch () AHM,(get_current_epoch()- get_last_good_epoch()) CeLGDiff,(get_last_good_epoch()-get_ahm_epoch())LgeAHmDiff,get_expected_recovery_epoch();
INFO 4544: Recovery Epoch Computation:
Node Dependencies:
011 - cnt: 9448
101 - cnt: 9448
110 - cnt: 9448
111 - cnt: 5
Nodes certainly in the cluster:
Node 1(v_usagedb_node0002), epoch 1164674
Node 2(v_usagedb_node0003), epoch 1164674
Filling more nodes to satisfy node dependencies:
Data dependencies fulfilled, remaining nodes LGEs don't matter:
Node 0(v_usagedb_node0001), epoch 1164669
CE | LGE | AHM | CeLGDiff | LgeAHmDiff | get_expected_recovery_epoch
---------+---------+--------+----------+------------+-----------------------------
1164675 | 1164669 | 797307 | 6 | 367362 | 1164674
To sync AHM with LGE, execute the below command:
oradmin=> SELECT MAKE_AHM_NOW();
The above command performs the below operations:
- Advances the epoch
- Performs a moveout operation on all projections.