Mempool Archive Data Guidelines
Getting Started
Each date has been partitioned into its own folder named in YYYYMMDD format. Within each date partition, there are 24 files, named by two digit hour (ie 02.csv.gz) that the transaction event was detected in. These files are tab delimited gzipped csvs.
For example, if you wanted to access transactions on June 16th, 2023 from 12pm-1pm, your URL would be: archive.blocknative.com/20230616/12.csv.gz
How to download
Query, download and store the data slices locally using the steps below:
curl https://archive.blocknative.com/YYYYMMDD/HH.csv.gz
Fetching a full day of data
Here is a script you can use to download all slices in a day on your computer. Just modify with the DATE
#!/bin/bash
# Set the date
DATE="YYYYMMDD"
DOMAIN="https://archive.blocknative.com/"
BASE_URL="${DOMAIN}${DATE}/"
# Initialize a variable to track successful downloads
SUCCESSFUL_DOWNLOADS=0
# Loop through each hour (00 to 23)
for HOUR in {00..23}; do
# Construct the URL for the current hour's data
URL="${BASE_URL}${HOUR}.csv.gz"
# Define the filename for the current hour's data
FILENAME="${HOUR}.csv.gz"
# Initialize a variable to keep track of retries
RETRIES=0
# Loop to handle retries on 404, 429, and 504 responses
while true; do
# Download the data and check the response status code
HTTP_STATUS=$(curl -o "$FILENAME" -w "%{http_code}" "$URL")
# Check the status code and print a message
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "Downloaded $FILENAME"
((SUCCESSFUL_DOWNLOADS++))
break # Exit the retry loop on success
elif [ "$HTTP_STATUS" -eq 429 ] || [ "$HTTP_STATUS" -eq 504 ]; then
echo "Received $HTTP_STATUS. Retrying in 1 second..."
sleep 1 # Wait for 1 second before retrying
((RETRIES++))
if [ $RETRIES -ge 3 ]; then
echo "Retry limit reached. Exiting."
exit 1
fi
elif [ "$HTTP_STATUS" -eq 404 ]; then
echo "File not found (404). Exiting for $FILENAME."
break # Exit the retry loop for 404
else
echo "Error downloading $FILENAME - Status code: $HTTP_STATUS"
rm "$FILENAME" # Remove the empty file
break # Exit the retry loop on other errors
fi
done
done
if [ "$SUCCESSFUL_DOWNLOADS" -eq 24 ]; then
echo "All slices downloaded successfully!"
else
echo "Some slices were not downloaded successfully."
fi
Save this script to a file, for example, download_slices.sh, and make it executable using the following command:
chmod +x download_slices.sh
Then, run the script by executing:
./download_slices.sh
Fetching on a custom range
Here is a script you can use to (1) download all hourly slices in a specific range of days on your computer, or (2) all specific hourly slices on a specific day.
Options:
-date-range: for downloading full hourly slices for all days within this range (both dates inclusive). Format:YYYYMMDD-YYYYMMDD- For date range:
./download_mempool.sh --date-range YYYYMMDD-YYYYMMDD
- For date range:
-hour-range: for downloading data for specific hours on a particular day. Format:YYYYMMDD:HH-HH- For hour range:
./download_mempool.sh --hour-range YYYYMMDD:HH-HH
- For hour range:
#!/bin/bash
# Fetch arguments
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
--date-range)
DATE_RANGE="$2"
shift; shift
;;
--hour-range)
HOUR_RANGE="$2"
shift; shift
;;
*)
shift
;;
esac
done
DOMAIN="https://archive.blocknative.com/"
SUCCESSFUL_DOWNLOADS=0
download_data() {
local DATE=$1
local HOUR_START=$2
local HOUR_END=$3
local BASE_URL="${DOMAIN}${DATE}/"
for HOUR in $(seq -w $HOUR_START $HOUR_END); do
URL="${BASE_URL}${HOUR}.csv.gz"
FILENAME="${DATE}_${HOUR}.csv.gz"
RETRIES=0
while true; do
HTTP_STATUS=$(curl -o "$FILENAME" -w "%{http_code}" "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "Downloaded $FILENAME"
((SUCCESSFUL_DOWNLOADS++))
break
elif [ "$HTTP_STATUS" -eq 429 ] || [ "$HTTP_STATUS" -eq 504 ]; then
echo "Received $HTTP_STATUS. Retrying in 1 second..."
sleep 1
((RETRIES++))
if [ $RETRIES -ge 3 ]; then
echo "Retry limit reached. Exiting."
exit 1
fi
elif [ "$HTTP_STATUS" -eq 404 ]; then
echo "File not found (404). Exiting for $FILENAME."
break
else
echo "Error downloading $FILENAME - Status code: $HTTP_STATUS"
rm "$FILENAME"
break
fi
done
done
}
# Date Range Mode
if [ ! -z "$DATE_RANGE" ]; then
IFS='-' read -ra DATES <<< "$DATE_RANGE"
START_DATE=${DATES[0]}
END_DATE=${DATES[1]}
for DATE in $(seq -w $START_DATE $END_DATE); do
download_data $DATE 00 23
done
fi
# Hour Range Mode
if [ ! -z "$HOUR_RANGE" ]; then
IFS=':' read -ra PARTS <<< "$HOUR_RANGE"
DATE=${PARTS[0]}
IFS='-' read -ra HOURS <<< "${PARTS[1]}"
HOUR_START=${HOURS[0]}
HOUR_END=${HOURS[1]}
download_data $DATE $HOUR_START $HOUR_END
fi
if [ "$SUCCESSFUL_DOWNLOADS" -gt 0 ]; then
echo "All slices downloaded successfully!"
else
echo "Some slices were not downloaded successfully."
fi
Save this script to a file, for example, download_mempool.sh, and make it executable using the following command:
chmod +x download_mempool.sh
Then, run the script by executing the command specified above the script.