Press ESC to close

On-Chain User Behavior Analysis Using Dune SQL & Data

Growth marketers and product analysts constantly fall into the same trap: judging dApp audiences purely by TVL or wallet balances. Seeing an address with 50 ETH triggers an immediate "high-value user" tag. This is a fundamental misunderstanding of web3 data.

In Web3 analytics, current balance is a vanity metric; the real signal is the funding source. True user behavior analytics starts with untangling wallet clusters. If your protocol hits a sudden spike of 1,000 new addresses, and a quick check shows they all got their first gas from a single CEX sub-account or a custom mixer contract within a tight time window, that isn't organic growth. You are looking at an industrial-scale Sybil farm draining liquidity or farming txs for a future airdrop.

Spotting these patterns requires analyzing wallet genesis: the timestamp of the first transaction, the funding donor, and the subsequent transfer chain.

Practical Dune SQL Query

This script isolates addresses interacting with your smart contract and extracts the absolute first transaction for each on-chain. It lets you instantly bucket users by their funding source.

-- Find the original gas source for addresses interacting with the contract
WITH target_users AS (
    SELECT DISTINCT "from" AS user_address
    FROM ethereum.transactions
    WHERE "to" = c'0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640' -- Example: Uniswap v3 USDC/WETH pool
    LIMIT 500
),
genesis_transactions AS (
    SELECT 
        t."to" AS user_address,
        t."from" AS funder,
        t.block_time,
        ROW_NUMBER() OVER (PARTITION BY t."to" ORDER BY t.block_time ASC) as rank_index
    FROM ethereum.transactions t
    JOIN target_users u ON t."to" = u.user_address
)
-- Keep only the absolute first historical transaction for each wallet
SELECT 
    g.user_address,
    g.funder AS parent_address,
    g.block_time AS creation_timestamp,
    -- Flag suspicious funders. Addresses below are known major mixers/bridges
    CASE 
        WHEN g.funder = c'0x12d66748a9176f0af33dfc592215a425441047f1' THEN 'Tornado.Cash (0.1 ETH)'
        WHEN g.funder = c'0x47ce0c6ed5b0ce3d3a51fdb1c52dc66a7c3c2936' THEN 'Tornado.Cash (1.0 ETH)'
        ELSE 'CEX Sub-account / Standard EOA'
    END AS funder_classification
FROM genesis_transactions g
WHERE g.rank_index = 1
ORDER BY g.block_time DESC;
-- Heavy query due to the global tx table JOIN. Partition by block_date for larger datasets.

Audience Segmentation: Execution Latency and Gas Logic

Behavioral analysis maps users into clean cohorts based on on-chain discipline. Forget surveys; users lie, the ledger doesn't.

One of the most effective yet underutilized segmentation parameters is Approve-to-Swap Latency—the time delta between standard token approval and the actual execution.

Average retail users rely on standard UI setups like MetaMask or Rabby. Anywhere from 15 seconds to several minutes pass between calling approve() and hitting swap() while they check slippage, hesitate, or get distracted. Bots and MEV searchers sign these atomically. Their transactions land either in the exact same block or bundled via Flashbots. If the latency between approval and transfer approaches zero, flag the address as a bot.

On-Chain Pattern Comparison

MetricAlgorithmic Address (Bot/Sybil)Retail UserWhale
Gas Optimization (Gas Price)Floor-level minimum to save costs, or extreme premiums for frontrunningStandard market rates, often overpaying due to default wallet presetsCalculated down to the gwei, frequently routing through private RPCs (Flashbots Protect) to dodge MEV
Interaction DensityHighly cyclical. Transactions fire in uniform batches, like 24-hour cron jobsErratic. Spikes track news cycles, hype, and token listingsLow frequency, high volume. Positions are held for months
Token DiversityLaser-focused on 1-2 target assets of the protocolA massive tail of illiquid tokens and memecoinsStable portfolio: native gas (ETH/BNB), WBTC, and core stablecoins

On-Chain pattern comparison
 

Cohort Retention Analysis via Dune SQL

Cohort analysis is the only way to verify Product-Market Fit and prove you aren't just burning marketing spend on one-and-done users. You need to track the behavior of users who onboarded in a specific month across subsequent periods.

Dune SQL handles this using window functions and date mathematics via DATE_DIFF.

-- Calculate monthly user retention for a specific smart contract
WITH user_activity AS (
    SELECT 
        "from" AS user_address,
        DATE_TRUNC('month', block_time) AS tx_month,
        MIN(DATE_TRUNC('month', block_time)) OVER (PARTITION BY "from") AS cohort_month
    FROM ethereum.transactions
    WHERE "to" = c'0xdac17f958d2ee523a2206206994597c13d831ec7' -- Example: USDT contract
),
cohort_volumes AS (
    -- Get baseline size for each cohort
    SELECT 
        cohort_month,
        COUNT(DISTINCT user_address) AS cohort_size
    FROM user_activity
    GROUP BY 1
),
retention_summary AS (
    -- Count unique returning users per month
    SELECT 
        a.cohort_month,
        DATE_DIFF('month', a.cohort_month, a.tx_month) AS period_index,
        COUNT(DISTINCT a.user_address) AS active_users_count
    FROM user_activity a
    GROUP BY 1, 2
)
-- Build final retention matrix
SELECT 
    r.cohort_month,
    c.cohort_size,
    r.period_index AS months_after_first_tx,
    r.active_users_count,
    ROUND(CAST(r.active_users_count AS DOUBLE) / c.cohort_size * 100, 2) AS retention_percentage
FROM retention_summary r
JOIN cohort_volumes c ON r.cohort_month = c.cohort_month
WHERE r.period_index >= 0
ORDER BY r.cohort_month ASC, r.period_index ASC;
-- Red flag: if Month 1 retention (period_index = 1) drops below 7%, product churn is lethal.

Cohort retention
 

One caveat: running this on a high-throughput system contract like USDT will generate massive cohorts. The query will likely hit timeout limits on Dune's free tier. For a custom dApp, this executes in seconds.

Hidden Events: Mining Deep Behavioral Triggers from Logs

Parsing standard transactions or traces table only scratches the surface. Deep-dive product analysis requires indexing system logs (ethereum.logs).

Why bother? Modern protocols increasingly lean on meta-transactions and account abstraction (ERC-4337). Under this architecture, the tx.from field exposed in standard transaction tables reflects the bundler or relayer address, not the end user. Evaluating standard transactions blindly will show a handful of active contracts instead of thousands of unique users.

The fix is extracting data straight from contract events. Parsing Topic 0 for the UserOperationEvent hash reveals whether a user relies on a smart wallet (Safe, Argent), hardware setups, or embedded telegram bot wallets. Mapping your audience's wallet architecture dictates your UX strategy. Retail users on smart accounts don't need a lecture on gas optimization; they need you to deploy paymaster contracts to subsidize fees via stables.

End-to-End Liquidity Tracking: Spotting LP Drain Patterns

Launching a liquidity mining campaign usually triggers a massive influx of liquidity providers (LPs). The catch is that most of this TVL is mercenary capital. Yield farmers park their funds in the pool only while the boosted APY lasts. As soon as rewards drop, they pull their liquidity instantly, crushing market depth.

Building an early warning system requires tracking more than just the raw remove-liquidity events. It means mapping where those funds flow next. Smart money and retail investors exhibit completely different behavioral footprints when taking profit.

  • Retail Addresses: Remove liquidity -> Swap LP tokens into stablecoins or native gas (ETH) -> Move funds to centralized exchanges (CEX) using standard deposit addresses.
  • Arbitrageurs and MEV Bots: Withdraw liquidity -> Instantly rotate it into a parallel pool on another protocol offering a 0.5% higher APY -> Execute everything within one or two blocks.
  • Institutional Pools: Transfer funds to multi-sig contracts or non-custodial lending protocols (Aave, Maker). They loop or supply assets as collateral instead of selling outright.

Advanced SQL Query for Tracking LP Migration

This query isolates wallets executing a liquidity withdrawal (Burn event) and tracks where those tokens land over the subsequent 24-hour window.

-- Analyzing LP post-withdrawal behavior
WITH lp_burns AS (
    SELECT 
        evt_tx_hash,
        evt_block_time AS burn_time,
        provider AS user_address,
        amount0 / 1e6 AS amount_usdc, -- Assuming token0 is USDC
        amount1 / 1e18 AS amount_weth  -- Assuming token1 is WETH
    FROM uniswap_v3_ethereum.Pair_evt_Burn
    WHERE contract_address = c'0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640' -- Target pool
      AND evt_block_time > CURRENT_DATE - INTERVAL '30' DAY
),
subsequent_txs AS (
    SELECT 
        b.user_address,
        b.burn_time,
        t.block_time AS tx_time,
        t."to" AS destination_address,
        ROW_NUMBER() OVER (PARTITION BY b.user_address, b.evt_tx_hash ORDER BY t.block_time ASC) as tx_order
    FROM lp_burns b
    JOIN ethereum.transactions t 
        ON t."from" = b.user_address 
        AND t.block_time > b.burn_time 
        AND t.block_time <= b.burn_time + INTERVAL '24' HOUR
)
-- Fetching the exact first action after removing liquidity
SELECT 
    s.user_address,
    s.burn_time,
    s.tx_time,
    s.destination_address,
    -- Labeling the destination address
    CASE 
        WHEN s.destination_address = c'0x3cd751e6b0078be393132286c442345e5dc49699' THEN 'Binance Deposit'
        WHEN s.destination_address = c'0x4675c7e5baafbfc4687441c22105873cd246be3a' THEN 'Hop Protocol Bridge'
        WHEN s.destination_address = c'0x7a250d5630b4cf539739df2c5dacb4c659f2488d' THEN 'Uniswap V2 Router'
        ELSE 'Unknown EOA / Other Protocol'
    END AS next_action_destination
FROM subsequent_txs s
WHERE s.tx_order = 1
ORDER BY s.burn_time DESC;
-- Note: edge case here. If the user fires multiple txs in the exact same second (bundle), tx_order partitioning might get messy.

Dissecting Toxic Flow and Its Impact on Product Metrics

DEXs and DeFi protocols must constantly monitor the quality of their trading volume. Volume splits into organic flow (users swapping assets for actual utility) and toxic flow (arbitrage, LVR—Loss Versus Rebalancing).

Toxic volume comes from bots. These scripts arbitrage passive LPs the second an external venue (like Binance) undergoes a sharp price move. Reporting a volume spike as a sign of organic product adoption is a classic amateur mistake. Usually, it just means the protocol's oracles are lagging, and arbitrageurs are bleeding the pools dry.

Flagging Toxic Addresses via SQL

The best way to catch this is tracking tx PnL over a 5-block horizon. If a wallet swaps, and the asset price shifts over the next 5 blocks in a way that hurts the pool but profits the swapper, that is pure arbitrage.

Wallets where toxic flow accounts for over 80% of total volume get tagged as Arbitrage_Bot in the analytics DB. When running retention or organic growth models, exclude their volume entirely. Otherwise, engagement metrics will be heavily skewed.

Data Architecture: From Raw Logs to Product Insights

Production-grade on-chain analytics cannot rely on ad-hoc queries. Real data pipelines aggregate raw primitives into custom abstractions (Spells in Dune syntax) that update on a cron schedule.

Here is a battle-tested data pipeline architecture for Web3 product teams:

[Raw On-Chain Primitives (Blocks, Transactions, Logs)]
                      │
                      ▼
[Decoded Contracts (Events & Calls)] ──► Strip system noise and reverted txs
                      │
                      ▼
[Business Logic Layer (Spells: Swaps, Deposits)] ──► Filter by Approve-to-Swap Latency
                      │
                      ▼
[Behavioral Cohorts (Production Dashboards)] ──► Product Metrics (LTV, CAC, Retention)

The Bottom Line

Stop copy-pasting Web2 frameworks into Web3 products. Unique Active Wallets (UAW) and raw transaction volume are trivial to sybil-attack for nominal gas fees. Real user behavior analytics requires auditing financial genealogy (gas funding), checking timestamps between low-level calls, and mapping capital flight patterns post-interaction. The chain never lies. You just have to write the right queries.

Oleg Filatov

As the Chief Technology Officer at EXMON Exchange, I focus on building secure, scalable crypto infrastructure and developing systems that protect user assets and privacy.

With over 15 years in cybersecurity, blockchain, and DevOps, I specialize in smart contract analysis, threat modeling, and secure system architecture.

At EXMON Academy, I share practical insights from real-world...

...

Leave a comment

Your email address will not be published. Required fields are marked *