go-ethereum/src/pages/docs/fundamentals/sync-modes.md

Syncing is the process by which Geth catches up to the latest Ethereum block and current global state. 
There are several ways to sync a Geth node that differ in their speed, storage requirements and trust 
assumptions. This page outlines three sync configurations for full nodes and one for light nodes.

## Full nodes

There are two types of full node that use different mechanisms to sync up to the head of the chain:

### Snap (default)

A snap sync'd node holds the most recent 128 block states in memory, so transactions in that range are always quickly 
accessible. However, snap-sync only starts processing from a relatively recent block (as opposed to genesis 
for a full node). Between the initial sync block and the 128 most recent blocks, the node stores occasional 
checkpoints that can be used to rebuild the state on-the-fly. This means transactions can be traced back as 
far as the block that was used for the initial sync. Tracing a single transaction requires reexecuting all 
preceding transactions in the same block **and** all preceding blocks until the previous stored snapshot. 
Snap-sync'd nodes are therefore functionally equal to full nodes, but the initial synchronization required 
a checkpoint block to sync from instead of independently verifying the chain all the way from genesis. 
Snap sync then only verifies the proof-of-work and ancestor-child block progression and assumes that the 
state transitions are correct rather than re-executing the transactions in each block to verify the state 
changes. Snap sync is much faster than full sync. To start a node with snap sync pass `--syncmode snap` at 
startup.

Snap sync starts by downloading the headers for a chunk of blocks. Once the headers have been verified, the block
bodies and receipts for those blocks are downloaded. Next, state sync begins. In state-sync, Geth first downloads the 
leaves of the state trie for each block without the intermediate nodes along with a range proof. The state trie is 
then regenerated locally. The state download is the part of the snap-sync that takes the most time to complete 
and the progress can be monitored using the ETA values in the log messages. However, the blockchain is also 
progressing at the same time and invalidating some of the regenerated state data. This means it is also necessary 
to have a 'healing' phase where errors in the state are fixed. It is not possible to monitor the progress of 
the state heal because the extent of the errors cannot be known until the current state has already been regenerated. 
The healing has to outpace the growth of the blockchain, otherwise the node will never catch up to the current state. 
There are some hardware factors that determine the speed of the state healing (speed of disk read/write and internet 
connection) and also the total gas used in each block (more gas means more changes to the state that have to be 
handled).

In summary, snap sync progresses in the following sequence:

- download and verify headers
- download block bodies and receipts
- download raw state data
- regenerate state trie
- heal state trie to account for newly arriving data

**Note** Snap sync is the default behaviour, so if the `--syncmode` value is not passed to Geth at startup, 
Geth will use snap sync. A node that is started using `snap` will switch to block-by-block sync once it has 
caught up to the head of the chain.

### Full

A full sync generates the current state by executing every block starting from the genesis block. A full sync 
indendently verifies proof-of-work and block provenance as well as all state transitions by re-executing the 
transactions in the entire historical sequence of blocks. Only the most recent 128 block states are stored in a full 
node - older block states are pruned periodically and represented as a series of checkpoints from which any previous 
state can be regenerated on request. 128 blocks is about 25.6 minutes of history with a block time of 12 seconds. 
To create a full node pass `--syncmode full` at startup.

## Archive nodes

An archive node is a node that retains all historical data right back to genesis. There is no need to regenerate 
any data from checkpoints because all data is directly available in the node's own storage. Archive nodes are 
therefore ideal for making fast queries about historical states. At the time of writing (September 2022) a full 
archive node that stores all data since genesis occupies nearly 12 TB of disk space (keep up with the current 
size on [Etherscan](https://etherscan.io/chartsync/chainarchive)). Archive nodes are created by configuring Geth's 
garbage collection so that old data is never deleted: `geth --syncmode full --gcmode archive`. 

It is also possible to create a partial/recent archive node where the node was synced using `snap` but the state 
is never pruned. This creates an archive node that saves all state data from the point that the node first syncs. 
This is configured by starting Geth with `--syncmode snap gcmode archive`.

## Light nodes

A light node syncs very quickly and stores the bare minimum of blockchain data. Light nodes only process block
headers, not entire blocks. This greatly reduces the computation time, storage and bandwidth required relative to a 
full node. This means light nodes are suitable for resource-constrained devices and can catch up to the head of the
chain much faster when they are new or have been offline for a while. The trade-off is that light nodes rely heavily 
on data served by altruistic full nodes. A light client can be used to query data from Ethereum and submit transactions, 
acting as a locally-hosted Ethereum wallet. However, because they don't keep local copies of the Ethereum state, light 
nodes can't validate blocks in the same way as full nodes - they receive a proof from the full node and verify it against their local header chain. 
To start a node in light mode, pass `--syncmode light`. Be aware that full nodes serving light data are relative scarce 
so light nodes can struggle to find peers.

Read more about light nodes on our [LES page](/docs/interface/les.md).

## Consensus layer syncing

Now that Ethereum has switched to proof-of-stake, all consensus logic and block propagation is handled by consensus clients. 
This means that syncing the blockchain is a process shared between the consensus and execution clients. Blocks are 
downloaded by the consensus client and verified by the execution client. In order for Geth to sync, it requires a header from
its connected consensus client. Geth does not import any data until it is instructed to by the consensus client. 

Once a header is available to use as a syncing target, Geth retrieves all headers between that target header and the 
local header chain in reverse chronological order. These headers show that the sequence of blocks is correct because
the parenthashes link one block to the next right up to the target block. Eventually, the sync will reach a block held 
in the local database, at which point the local data and the target data are considered 'linked' and there is a very high 
chance the node is syncing the correct chain. The block bodies are then downloaded and then the state data. The consensus
client can update the target header - as long as the syncing outpaces the growth of the blockchain then the node will eventually
get in sync.

There are two ways for the consensus client to find a block header that Geth can use as a sync target: optimistic syncing and 
checkpoint syncing:

### Optimistic sync

Optimistic sync downloads blocks before the execution client has validated them. In optimistic sync the node assumes 
the data it receives from its peers is correct during the downloading phase but then retroactively verifies each 
downloaded block. Nodes are not allowed to attest or propose blocks while they are still 'optimistic' because they 
can't yet guarantee their view of the head of the chain is correct.

Read more in the [optimistic sync specs](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md).

### Checkpoint sync

Alternatively, the consensus client can grab a checkpoint from a trusted source which provides a target state to sync 
up to, before switching to full sync and verifying each block in turn. In this mode, the node trusts that the checkpoint 
is correct. There are many possible sources for this checkpoint - the gold standard would be to get it out-of-band 
from another trusted friend, but it could also come from block explorers or public APIs/web apps.

**Note** it is not currently possible to use a Geth light node as an execution client on proof-of-stake Ethereum.

## Summary

There are several ways to sync a Geth node. The default is to use snap sync to create a full node. This verifies all 
blocks using some recent block that is old enough to be safe from re-orgs as a sync target. A trust-minimized alternative 
is full-sync, which verifies every block since genesis. These modes drop state data older than 128 blocks, keeping only 
checkpoints that enable on-request regeneration of historical states. For rapid queries of historical data an archive node 
is required. Archive nodes keep local copies of all historical data right back to genesis - currently about 12 TB and growing. 
The opposite extreme is a light node that doesn't store any blockchain data - it requests everything from full nodes. 
These configurations are controlled by passing `full`, `snap` or `light` to `--syncmode` at startup. For an archive node,
`--syncmode` should be `full` and `--gcmode` should be set to `archive`. Currently, due to the transition to proof-of-stake, 
light-sync dot not work (new light client protocols are being developed).
add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`Syncing is the process by which Geth catches up to the latest Ethereum block and current global state.`
			`There are several ways to sync a Geth node that differ in their speed, storage requirements and trust`
			`assumptions. This page outlines three sync configurations for full nodes and one for light nodes.`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
			`## Full nodes`

chore: sync content with master branch 2022-09-14 20:01:44 +03:00			`There are two types of full node that use different mechanisms to sync up to the head of the chain:`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
chore: sync content with master branch 2022-09-14 20:01:44 +03:00			`### Snap (default)`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`A snap sync'd node holds the most recent 128 block states in memory, so transactions in that range are always quickly`
			`accessible. However, snap-sync only starts processing from a relatively recent block (as opposed to genesis`
			`for a full node). Between the initial sync block and the 128 most recent blocks, the node stores occasional`
			`checkpoints that can be used to rebuild the state on-the-fly. This means transactions can be traced back as`
			`far as the block that was used for the initial sync. Tracing a single transaction requires reexecuting all`
			`preceding transactions in the same block and all preceding blocks until the previous stored snapshot.`
			`Snap-sync'd nodes are therefore functionally equal to full nodes, but the initial synchronization required`
			`a checkpoint block to sync from instead of independently verifying the chain all the way from genesis.`
			`Snap sync then only verifies the proof-of-work and ancestor-child block progression and assumes that the`
			`state transitions are correct rather than re-executing the transactions in each block to verify the state`
			changes. Snap sync is much faster than full sync. To start a node with snap sync pass `--syncmode snap` at
			`startup.`

			`Snap sync starts by downloading the headers for a chunk of blocks. Once the headers have been verified, the block`
			`bodies and receipts for those blocks are downloaded. Next, state sync begins. In state-sync, Geth first downloads the`
			`leaves of the state trie for each block without the intermediate nodes along with a range proof. The state trie is`
			`then regenerated locally. The state download is the part of the snap-sync that takes the most time to complete`
			`and the progress can be monitored using the ETA values in the log messages. However, the blockchain is also`
			`progressing at the same time and invalidating some of the regenerated state data. This means it is also necessary`
			`to have a 'healing' phase where errors in the state are fixed. It is not possible to monitor the progress of`
			`the state heal because the extent of the errors cannot be known until the current state has already been regenerated.`
			`The healing has to outpace the growth of the blockchain, otherwise the node will never catch up to the current state.`
			`There are some hardware factors that determine the speed of the state healing (speed of disk read/write and internet`
			`connection) and also the total gas used in each block (more gas means more changes to the state that have to be`
			`handled).`

			`In summary, snap sync progresses in the following sequence:`

			`- download and verify headers`
			`- download block bodies and receipts`
			`- download raw state data`
			`- regenerate state trie`
			`- heal state trie to account for newly arriving data`

			Note Snap sync is the default behaviour, so if the `--syncmode` value is not passed to Geth at startup,
			Geth will use snap sync. A node that is started using `snap` will switch to block-by-block sync once it has
			`caught up to the head of the chain.`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
chore: sync content with master branch 2022-09-14 20:01:44 +03:00			`### Full`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`A full sync generates the current state by executing every block starting from the genesis block. A full sync`
			`indendently verifies proof-of-work and block provenance as well as all state transitions by re-executing the`
			`transactions in the entire historical sequence of blocks. Only the most recent 128 block states are stored in a full`
			`node - older block states are pruned periodically and represented as a series of checkpoints from which any previous`
			`state can be regenerated on request. 128 blocks is about 25.6 minutes of history with a block time of 12 seconds.`
			To create a full node pass `--syncmode full` at startup.
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
chore: sync content with master branch 2022-09-14 20:01:44 +03:00			`## Archive nodes`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`An archive node is a node that retains all historical data right back to genesis. There is no need to regenerate`
			`any data from checkpoints because all data is directly available in the node's own storage. Archive nodes are`
			`therefore ideal for making fast queries about historical states. At the time of writing (September 2022) a full`
			`archive node that stores all data since genesis occupies nearly 12 TB of disk space (keep up with the current`
			`size on [Etherscan](https://etherscan.io/chartsync/chainarchive)). Archive nodes are created by configuring Geth's`
			garbage collection so that old data is never deleted: `geth --syncmode full --gcmode archive`.
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			It is also possible to create a partial/recent archive node where the node was synced using `snap` but the state
			`is never pruned. This creates an archive node that saves all state data from the point that the node first syncs.`
			This is configured by starting Geth with `--syncmode snap gcmode archive`.
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
			`## Light nodes`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`A light node syncs very quickly and stores the bare minimum of blockchain data. Light nodes only process block`
			`headers, not entire blocks. This greatly reduces the computation time, storage and bandwidth required relative to a`
			`full node. This means light nodes are suitable for resource-constrained devices and can catch up to the head of the`
			`chain much faster when they are new or have been offline for a while. The trade-off is that light nodes rely heavily`
			`on data served by altruistic full nodes. A light client can be used to query data from Ethereum and submit transactions,`
			`acting as a locally-hosted Ethereum wallet. However, because they don't keep local copies of the Ethereum state, light`
			`nodes can't validate blocks in the same way as full nodes - they receive a proof from the full node and verify it against their local header chain.`
			To start a node in light mode, pass `--syncmode light`. Be aware that full nodes serving light data are relative scarce
			`so light nodes can struggle to find peers.`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`Read more about light nodes on our [LES page](/docs/interface/les.md).`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
chore: sync content with master branch 2022-09-14 20:01:44 +03:00			`## Consensus layer syncing`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`Now that Ethereum has switched to proof-of-stake, all consensus logic and block propagation is handled by consensus clients.`
			`This means that syncing the blockchain is a process shared between the consensus and execution clients. Blocks are`
			`downloaded by the consensus client and verified by the execution client. In order for Geth to sync, it requires a header from`
			`its connected consensus client. Geth does not import any data until it is instructed to by the consensus client.`

			`Once a header is available to use as a syncing target, Geth retrieves all headers between that target header and the`
			`local header chain in reverse chronological order. These headers show that the sequence of blocks is correct because`
			`the parenthashes link one block to the next right up to the target block. Eventually, the sync will reach a block held`
			`in the local database, at which point the local data and the target data are considered 'linked' and there is a very high`
			`chance the node is syncing the correct chain. The block bodies are then downloaded and then the state data. The consensus`
			`client can update the target header - as long as the syncing outpaces the growth of the blockchain then the node will eventually`
			`get in sync.`

			`There are two ways for the consensus client to find a block header that Geth can use as a sync target: optimistic syncing and`
			`checkpoint syncing:`
chore: sync content with master branch 2022-09-14 20:01:44 +03:00
			`### Optimistic sync`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`Optimistic sync downloads blocks before the execution client has validated them. In optimistic sync the node assumes`
			`the data it receives from its peers is correct during the downloading phase but then retroactively verifies each`
			`downloaded block. Nodes are not allowed to attest or propose blocks while they are still 'optimistic' because they`
			`can't yet guarantee their view of the head of the chain is correct.`
chore: sync content with master branch 2022-09-14 20:01:44 +03:00
			`Read more in the [optimistic sync specs](https://github.com/ethereum/consensus-specs/blob/dev/sync/optimistic.md).`

			`### Checkpoint sync`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`Alternatively, the consensus client can grab a checkpoint from a trusted source which provides a target state to sync`
			`up to, before switching to full sync and verifying each block in turn. In this mode, the node trusts that the checkpoint`
			`is correct. There are many possible sources for this checkpoint - the gold standard would be to get it out-of-band`
			`from another trusted friend, but it could also come from block explorers or public APIs/web apps.`
chore: sync content with master branch 2022-09-14 20:01:44 +03:00
			`Note it is not currently possible to use a Geth light node as an execution client on proof-of-stake Ethereum.`
chore: add /vulnerabilities 2022-09-09 15:28:25 +03:00
			`## Summary`

add docs landing page, update sync-modes 2022-09-26 12:57:51 +03:00			`There are several ways to sync a Geth node. The default is to use snap sync to create a full node. This verifies all`
			`blocks using some recent block that is old enough to be safe from re-orgs as a sync target. A trust-minimized alternative`
			`is full-sync, which verifies every block since genesis. These modes drop state data older than 128 blocks, keeping only`
			`checkpoints that enable on-request regeneration of historical states. For rapid queries of historical data an archive node`
			`is required. Archive nodes keep local copies of all historical data right back to genesis - currently about 12 TB and growing.`
			`The opposite extreme is a light node that doesn't store any blockchain data - it requests everything from full nodes.`
			These configurations are controlled by passing `full`, `snap` or `light` to `--syncmode` at startup. For an archive node,
			`--syncmode` should be `full` and `--gcmode` should be set to `archive`. Currently, due to the transition to proof-of-stake,
			`light-sync dot not work (new light client protocols are being developed).`