Static Analysis for EVM Contract Selectors

Published: May 18, 2025

A few days ago Zellic released an updated version of their smart contract dataset. The new version, all-ethereum-contracts is like the previous dataset containing all smart contract deployments done historically up until recent time (February 2025 in this case), only difference now is that they give us the raw bytecode instead of the original source code. Given this new dataset, let's use it to create a simple static analysis script to extract ABI selectors from the bytecode.

ABI selectors

It's expected that the readers know about Application Binary Interface and the selectors already, but we provide a high level explanation. The ABI is the standard interface that users and smart contracts use to communicate with each other. Part of the ABI is the selectors which is used to distinguish between the different functions and logs within a smart contract.

You can read more about function selectors here and event selectors here. The ABI spec of Solidity (which Vyper also uses) can be read here

Existing solutions

There are multiple existing solutions already (in no specific order):

Some of them use static analysis and some of them use dynamic analysis to infer the bytecode selectors. Some of them also try to infer the full ABI JSON, but we are focusing on only retrieving the selectors in this post. That said, most implementations just use a pre-image dictionary to resolve the full ABI from the selector (same technique can be used with our version).

Known compiler logic

The log selector logic works similarly for both Solidity and Vyper. Same is true for the function selectors, but the JUMP table where they are used is very different between the compilers. Vyper did recently a write-up on how their constant time jump tables work. Solidity hasn't really done any write-ups as far as I know, but if you look inside the source code then you can see they mention some patterns.

Our approach

Instead of looking at what the developers tell us, we will look at the bytecode. Given we know the ABI ahead of time, we can see where and how the selectors are placed within the bytecode. Then create a sliding window to understand the recurring patterns. Then we find the most common patterns and we can then create a script around that to evaluate it.

Dataset

Sadly there isn't any good bytecode dataset for doing this which includes all major compiler versions for both Solc and Vyper, at least I don't know of any that isn't outdated. Therefore I created one, it's created by sampling block intervals from the all-ethereum-contracts dataset and deduped based on the provided bytecode hash. Then we get the compiler version from Etherscan based on the address.

We then ended up with the following

We postprocess this with the verified contract response from Etherscan to get out the ABI selectors from the returned ABI.

Code for finding the patterns

            
from evm import get_opcodes_from_bytecode, PushOpcode, JUMP_DEST
from collections import defaultdict
from copy import deepcopy
from tqdm import tqdm
import random
import glob
import json
import os

WINDOW_SIZE = 16
MAX_PATTERNS = 10
FOLDER_PATH = os.environ.get("FOLDER_PATH")
assert FOLDER_PATH is not None

patterns = {
    "solc": defaultdict(int),
    "vyper": defaultdict(int),
}


def transform_opcodes_window(bytecode, opcodes, selectors, index):
    opcodes_window = opcodes[index : index + WINDOW_SIZE]
    min_index = float("inf")
    for index, op in enumerate(opcodes_window):
        if not isinstance(op, PushOpcode):
            continue
        op_args_int = int.from_bytes(bytes.fromhex(op.args), byteorder="big")
        if op.args in selectors["functions"]:
            opcodes_window[index] = "<func_selector>"
            min_index = min(index, min_index)
        elif op.args in selectors["events"]:
            opcodes_window[index] = "<log_selector>"
            min_index = min(index, min_index)
        elif op_args_int < len(opcodes) and bytecode[op_args_int] == JUMP_DEST:
            opcodes_window[index] = f"<jumpdest>"
        else:
            opcodes_window[index] = f"{op.name} <data>"
    return opcodes_window, min_index


def main():
    for file in tqdm(glob.glob(os.path.join(FOLDER_PATH, "**/*.json"))):
        with open(file, "r") as file:
            data = json.load(file)
            bytecode = data["bytecode"]
            bytecode = bytes.fromhex(bytecode.lstrip("0x"))
            selectors = data["selectors"]
            if selectors is None:
                continue
        compiler = data["compiler"]["kind"]
        opcodes = get_opcodes_from_bytecode(bytecode)
        for index, _ in enumerate(opcodes):
            opcodes_window, min_index = transform_opcodes_window(
                bytecode, opcodes, selectors, index
            )
            if min_index == float("inf"):
                continue

            opcodes_window_og = deepcopy(opcodes_window)
            while len(opcodes_window) > 2 and (
                "<func_selector>" in opcodes_window
                or "<log_selector>" in opcodes_window
            ):
                current_window = " ".join(list(map(str, list(opcodes_window))))
                patterns[compiler][current_window] += 1
                opcodes_window = opcodes_window[:-1]

            opcodes_window = opcodes_window_og
            while len(opcodes_window) > 2 and (
                "<func_selector>" in opcodes_window
                or "<log_selector>" in opcodes_window
            ):
                current_window = " ".join(list(map(str, list(opcodes_window))))
                patterns[compiler][current_window] += 1
                opcodes_window = opcodes_window[1:]

    compiler_patterns = {}
    for compiler in patterns:
        compiler_patterns[compiler] = []
        for pattern in sorted(
            list(patterns[compiler].keys()),
            key=lambda x: patterns[compiler][x],
            reverse=True,
        ):
            for v in compiler_patterns[compiler]:
                # If it's a subset, let's skip.
                if v in pattern or pattern in v:
                    break
            else:
                compiler_patterns[compiler].append(pattern)

            if len(compiler_patterns[compiler]) > MAX_PATTERNS:
                break
    print(json.dumps(compiler_patterns, indent=4))


if __name__ == "__main__":
    main()
            
        

Then we get out the following patterns:

solc

vyper

There is one obvious pattern I see we are missing which is for the case when the selector is 0x000000 where the compiler usually will optimize the check into a ISZERO comparison.

Benchmarking

Note 1: We did not test gigahorse in this comparison because of it's long execution time, happy to retry if there is a config I can tune to get a response quicker.

Note 2: that this isn't an entirely fair evaluation as some of these tools do more than just extracting the selectors and therefore has additional complexity

Rank Model F1-Score Recall Precision
🥇 1 Evmmole 0.9785 0.9588 0.9990
🥈 2 sevm 0.8980 0.8157 0.9989
🥉 3 Our naive pattern model 0.7986 0.6655 0.9983
4 whatsabi 0.7986 0.6655 0.9983
5 heimdall 0.7886 0.6514 0.9989

Obviously the dynamic analysis approaches beat the static analysis approaches. However, our naive implementation is still able to get a pretty good F1-score.

But these relationships should be possible to learn by a simple neural network also and that should then (hopefully) also improve on our existing naive approach.

            
WINDOW_SIZE = 5

class SelectorDetector(torch.nn.Module):
    def __init__(self, vocab_size, classes, simple):
        super(SelectorDetector, self).__init__()
        self.simple = simple
        self.head =  torch.nn.Sequential(
            torch.nn.Embedding(vocab_size, 128),
        )
        self.body = torch.nn.Sequential(
            torch.nn.Linear(128, 256),
            torch.nn.BatchNorm1d(WINDOW_SIZE),
            torch.nn.ReLU(),
            torch.nn.Dropout(0.3),
            torch.nn.Linear(256, 128),
            torch.nn.Sigmoid(),
            torch.nn.Linear(128, classes + 1),
        )

        self.apply(self._init_weights)

    def _init_weights(self, module):
        if isinstance(module, torch.nn.Linear):
            torch.nn.init.xavier_uniform_(module.weight)
            torch.nn.init.zeros_(module.bias)
        elif isinstance(module, torch.nn.Embedding):
            torch.nn.init.normal_(module.weight, mean=0, std=0.1)

    def forward(self, X):
        out = self.body(self.head(X))
        return out.mean(dim=1)
    
            
        

Now let's evaluate it and look at the results

Rank Model F1-Score Recall Precision
🥇 1 Evmmole 0.9785 0.9588 0.9990
🥈 2 Our torch model 0.9283 0.9429 0.9142
🥉 3 sevm 0.8980 0.8157 0.9989
4 Our naive pattern model 0.7986 0.6655 0.9983
5 whatsabi 0.7986 0.6655 0.9983
6 heimdall 0.7886 0.6514 0.9989

Nice! We are now only behind a dynamic analysis solution, not bad.

What about logs?

You can use the same technique to find the event logs also. Unlike jump tables though, the compiler might place the PUSH32 opcode with the topic0 far away from the LOG opcode using it so using a different technique is advised. For instance since the topic0 is a hash, we can instead just check that there is a certain amount of randomness in larger PUSH32 values.

Sometimes the compiler will also optimize it into the .data section so beware.

Conclusion

Great, we wrote a selector extractor algorithm in a few hours using a data driven approach and also benchmarked it to verify that it works.

Reading list

If you liked this blog post, you might also like the following posts (not written by me):