# ONNX Runtime accuracy testing tool
This tool measures the accuracy of a set of models on a given execution provider. The accuracy is computed by comparing with the expected results, which are either loaded from file or attained by running the model with the CPU execution provider.

## Build instructions on Windows
### Using an ONNX Runtime NuGet package
Download an ONNX Runtime NuGet package with the desired execution provider(s):
- [Microsoft.ML.OnnxRuntime](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime)
- [Microsoft.ML.OnnxRuntime.QNN](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.QNN)
- [Microsoft.ML.OnnxRuntime.Gpu](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.Gpu)
- Others: https://www.nuget.org/packages?q=Microsoft.ML.OnnxRuntime

Clone this onnxruntime-inference-examples repository:
```shell
 git clone https://github.com/Microsoft/onnxruntime-inference-examples.git
 cd onnxruntime-inference-examples\c_cxx\accuracy_tool
```

Run `build.bat` with the path to the ONNX Runtime NuGet package as the first argument.
```shell
$ build.bat .\microsoft.ml.onnxruntime.1.18.0.nupkg
```

Run the following command to open the solution file with Visual Studio.

```shell
$ devenv .\build\onnxruntime_accuracy_test.sln
```

Alternatively, you can directly run the executable from the terminal:

```shell
.\build\Release\accuracy_test.exe --help
```

### Using an ONNX Runtime source build
#### Build ONNX Runtime from source
Refer to the documentation for [building ONNX Runtime from source](https://www.onnxruntime.ai/docs/build/) with the desired execution providers.

The following commands build ONNX Runtime from source with the CPU EP.

Clone the ONNX Runtime repository:
```shell
 git clone --recursive https://github.com/Microsoft/onnxruntime.git
 cd onnxruntime
```

Build ONNX Runtime from source. Replace `<ORT_INSTALL_DIR>` with your desired installation directory.
```shell
.\build.bat --config RelWithDebInfo --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync --skip_tests --cmake_extra_defines CMAKE_INSTALL_PREFIX=<ORT_INSTALL_DIR>
```

Install ONNX Runtime to `<ORT_INSTALL_DIR>`:
```shell
 cmake --install .\build\RelWithDebInfo --config RelWithDebInfo
```

#### Build accuracy tool
Clone this onnxruntime-inference-examples repository:
```shell
 git clone https://github.com/Microsoft/onnxruntime-inference-examples.git
 cd onnxruntime-inference-examples\c_cxx\accuracy_tool
```

Run `build.bat` with the path to the ONNX Runtime installation directory as the first argument.
```shell
$ build.bat <ORT_INSTALL_DIR>
```

Run the following command to open the solution file with Visual Studio.

```shell
$ devenv .\build\onnxruntime_accuracy_test.sln
```

Alternatively, you can directly run the executable from the terminal:

```shell
.\build\Release\accuracy_test.exe --help
```

### Using an ONNX Runtime Github release package
Download an ONNX Runtime release package from https://github.com/microsoft/onnxruntime/releases/ and extract it to your desired installation directory (`<ORT_INSTALL_DIR>`).

Clone this onnxruntime-inference-examples repository:
```shell
 git clone https://github.com/Microsoft/onnxruntime-inference-examples.git
 cd onnxruntime-inference-examples\c_cxx\accuracy_tool
```

Run `build.bat` with the path to the extracted ONNX Runtime installation directory as the first argument.
```shell
$ build.bat <ORT_INSTALL_DIR>
```

Run the following command to open the solution file with Visual Studio.

```shell
$ devenv .\build\onnxruntime_accuracy_test.sln
```

Alternatively, you can directly run the executable from the terminal:

```shell
.\build\Release\accuracy_test.exe --help
```

## Setup test models and inputs
This tool expects all models and input files to be arranged in a specific directory structure.

```
models/
 |
 +--> resnet/
 |      |
 |      +--> model.onnx
 |      +--> model.qdq.onnx (quantized model only required for certains EPs like QNN)
 |      |
 |      +--> test_data_set_0/
 |      |        |
 |      |        +--> input_0.raw
 |      |        +--> input_1.raw
 |      |        +--> output_0.raw (optional, can be generated by tool)
 |      |        +--> output_1.raw (optional, can be generated by tool)
 |      +--> test_data_set_1/
 |      |
 |      +--> test_data_set_2/
 |
 +--> mobilenet/
        |
        +--> model.onnx
        +--> model.qdq.onnx
        |
        +--> test_data_set_0/
        +--> test_data_set_1/
```

- All ONNX models must be named either `model.onnx` or `model.qdq.onnx`.
  - The `model.qdq.onnx` file is only necessary for execution providers that run quantized models (e.g., QNN).
  - If the expected output files are not provided, the expected outputs will be obtained by running `model.onnx` on the CPU execution provider.
  - Both `model.qdq.onnx` and `model.onnx` must have the same input and output signature (i.e., same names, shapes, types, and ordering).
- The dataset directories must be named `test_data_set_<index>/`, where `<index>` ranges from 0 to the number of dataset directories.
- The raw input files must be named `input_<index>.raw`, where `<index>` corresponds to the input's index in the ONNX model.
- The raw output files are not required if `model.onnx` is provided.
  - The raw output files must be named `output_<index>.raw`, where `<index>` corresponds to the output's index in the ONNX model.
  - The raw output files can be automatically generated by the tool by specifying the `-save_expected_outputs` (`-s`) command-line argument.

## Command-line options
```shell
.\accuracy_test --help

Usage: accuracy_test.exe [OPTIONS...] test_models_path

[OPTIONS]:
 -h/--help                        Print this help message and exit program
 -j/--num_threads num_threads     Number of threads to use for inference.
                                  Defaults to number of cores.
 -l/--load_expected_outputs       Load expected outputs from raw output_<index>.raw files
                                  Defaults to false.
 -s/--save_expected_outputs       Save outputs from baseline model on CPU EP to disk as
                                  output_<index>.raw files. Defaults to false.
 -e/--execution_provider ep [EP_ARGS]  The execution provider to test (e.g., qnn or cpu)
                                       Defaults to CPU execution provider running QDQ model.
 -c/--session_configs "<key1>|<val1> <key2>|<val2>"  Session configuration options for EP under test.
                                                     Refer to onnxruntime_session_options_config_keys.h
 -o/--output_file path                 The output file into which to save accuracy results
 -a/--expected_accuracy_file path      The file containing expected accuracy results
 --model model_name                    Model to test. Option can be specified multiple times.
                                       By default, all found models are tested.

[EP_ARGS]: Specify EP-specific runtime options as key value pairs.
  Example: -e <provider_name> "<key1>|<val1> <key2>|<val2>"
  [QNN only] [backend_path]: QNN backend path (e.g., 'C:\Path\QnnHtp.dll')
  [QNN only] [profiling_level]: QNN profiling level, options: 'basic', 'detailed',
                                default 'off'.
  [QNN only] [rpc_control_latency]: QNN rpc control latency. default to 10.
  [QNN only] [vtcm_mb]: QNN VTCM size in MB. default to 0 (not set).
  [QNN only] [htp_performance_mode]: QNN performance mode, options: 'burst', 'balanced',
             'default', 'high_performance', 'high_power_saver',
             'low_balanced', 'low_power_saver', 'power_saver',
             'sustained_high_performance'. Defaults to 'default'.
  [QNN only] [qnn_context_priority]: QNN context priority, options: 'low', 'normal',
             'normal_high', 'high'. Defaults to 'normal'.
  [QNN only] [qnn_saver_path]: QNN Saver backend path. e.g 'C:\Path\QnnSaver.dll'.
  [QNN only] [htp_graph_finalization_optimization_mode]: QNN graph finalization
             optimization mode, options: '0', '1', '2', '3'. Default is '0'.
```

## Usage examples
### Measure accuracy of QDQ model on CPU EP
- The expected outputs are generated by running the float32 `model.onnx` on CPU EP.
- Accuracy results (SNR) are dumped to stdout

```shell
$ .\accuracy_test -e cpu models

[INFO]: Accuracy Results (CSV format):

model_a/test_data_set_0,17.640392603599537
model_a/test_data_set_1,21.326599488217347
model_a/test_data_set_2,16.712691432087745
...
```

Use the `-o` command-line option to write the accuracy results to file.
```shell
$ .\accuracy_test -o results.csv -e cpu models

[INFO]: Saved accuracy results to results.csv
```

### Dump (and load) the expected outputs to disk
Use the `-s` command-line option to dump the expected outputs to disk (e.g., output_0.raw). The expected outputs are obtained by running `model.onnx` on the CPU EP regardless of the EP passed to the `-e` command-line option.
```shell
$ .\accuracy_test -s -e cpu models

[INFO]: Accuracy Results (CSV format):

model_a/test_data_set_0,17.640392603599537
...
```

Use the `-l` command-line option to load the expected outputs directly from `output_<index>.raw` files.
```shell
$ .\accuracy_test -l -e cpu models

[INFO]: Accuracy Results (CSV format):

model_a/test_data_set_0,17.640392603599537
...
```

### Measure accuracy of QDQ model on QNN EP and detect regressions
- The expected outputs are generated by running the float32 `model.onnx` on CPU EP.
- Accuracy results (SNR) are dumped to results_0.csv
- Uses the `-c` command-line option to disable fallback to CPU EP (i.e., entire graph runs on QNN EP).
- Note: can also use the `-s` or `-l` command-line options to save or load the expected outputs as demonstrated above.

```shell
$ .\accuracy_test -e qnn "backend_path|QnnHtp.dll" -c "session.disable_cpu_ep_fallback|1" -o results_0.csv models

[INFO]: Accuracy Results (CSV format):

model_a/test_data_set_0,17.640392603599537
model_a/test_data_set_1,21.426599488217347
model_a/test_data_set_2,16.812691432087745
...
```

Use the `-a` command-line option to compare subsequent runs with previous accuracy results (e.g., results_0.csv). This can help detect accuracy regressions.

```shell
.\accuracy_test -a results_o.csv -e qnn "backend_path|QnnHtp.dll" -c "session.disable_cpu_ep_fallback|1" models

[INFO]: Accuracy Results (CSV format):

model_a/test_data_set_0,16.640392603599537
...


[INFO]: Comparing accuracy with results_0.csv

 [1] Checking if model_a/test_data_set_0 degraded ... FAILED
        Output 0 SNR decreased: expected 17.640392603599537, actual 16.640392603599537

 [2] Checking if model_a/test_data_set_1 degraded ... PASSED
 [3] Checking if model_a/test_data_set_2 degraded ... PASSED
 [4] Checking if model_a/test_data_set_3 degraded ... PASSED
...

[INFO]: 10/11 tests passed.
[INFO]: 1/11 tests failed.
```
