fix: pre-quantized models run optimize, only skip quantize#794
fix: pre-quantized models run optimize, only skip quantize#794DingmaomaoBJTU wants to merge 3 commits into
Conversation
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly explained and the change is minimal. Tests are updated correctly. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall a clean, well-motivated fix. The root cause is clearly explained and the change is minimal. Tests are updated correctly. A few minor suggestions inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the solution is minimal and correct: skipping run_optimize_analyze_loop() entirely preserves QDQ structure for downstream EPs. Tests are properly updated. One minor issue noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix -- the root cause analysis is clear and the change is minimal. The pre-quantized path now correctly preserves the QDQ structure by skipping the ORT optimizer entirely. A few minor suggestions inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. Skipping optimize_onnx entirely for pre-quantized models is the right call — ORT Level 2 fusing QDQ patterns into QLinearConv is a known foot-gun for QNN/DML EPs. The test updates are consistent. One minor nit below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root-cause is clear and the change is minimal. Two minor issues noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal. Two minor nits inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root cause is clear and the change is minimal. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are well updated. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-explained and the change is minimal and correct. Left a couple of inline nits.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Reviewed — this looks good. The fix is clean and minimal. One minor observation:
src/winml/modelkit/build/onnx.py: The zeroed-out tuple �nalyze_iters, analyze_unsupported, analyze_details = 0, 0, {} is fine for now, but if �uild_onnx_model ever adds new return fields from the analyze loop, this line silently falls out of sync. Consider adding a brief comment noting this must track the return shape of
un_optimize_analyze_loop.
Tests are correctly updated. The docstring on est_pre_quantized_runs_analyze_only accurately reflects the new behavior.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix with good test coverage. The logic change is correct - skipping run_optimize_analyze_loop() entirely for pre-quantized models prevents ORT Level 2 QDQ fusion. Two minor issues noted inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. Skipping optimize entirely for pre-quantized models is the correct approach. Tests properly updated.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-scoped fix. The root cause analysis is solid and the test updates properly reflect the new behavior. One minor suggestion inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - skipping optimize_onnx() entirely for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause is well explained and the approach is correct. Skipping the ORT optimizer entirely for pre-quantized models makes sense to preserve QDQ structure. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. Skipping ORT Level 2 optimization for pre-quantized models avoids QDQ pattern fusion that breaks EP compatibility. Code is clean, tests updated correctly.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix — the root cause is clear and the change is minimal. A couple of minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly identified and the change is minimal and correct. Tests are properly updated. A couple of minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause and verification are clearly documented. The code change is minimal and correct: pre-quantized models now bypass optimize_onnx() entirely, preserving QDQ structure for downstream EPs. Two minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns breaking EP compatibility) is well-identified and the solution is minimal. Tests updated correctly to assert_not_called(). One minor suggestion below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean fix overall. A couple inline suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall - the root cause is clear and the change is minimal. The tests are well-updated to match the new behavior. Left a couple of minor suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause and solution are well-motivated. Skipping ORT Level 2 optimization for pre-quantized models is the correct approach to preserve QDQ structure. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the change is minimal and correct. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause analysis is solid and the change is minimal. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall — the root cause is clear and the change is minimal. A few observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly documented and the solution is appropriately minimal. A few inline suggestions.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause is well-identified and the solution is clean — skipping ORT Level 2 optimization entirely for pre-quantized models prevents QDQ fusion that breaks EP compatibility. The tests correctly reflect the new behavior. One minor nit: see inline comment about a stale log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall - the root cause is well-explained and the test updates are consistent. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - the root cause analysis is clear and the change is minimal. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix - clear root cause, clean implementation. The QDQ fusion was silently regressing latency, and skipping optimize entirely is the right call. A couple of inline notes below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, surgical fix with a well-documented root cause. The change is straightforward and the tests are properly updated. One consideration below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall the fix is correct and well-motivated. The PR description clearly explains the root cause and includes verification. A couple of minor suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean, well-motivated fix. The root cause is clearly identified and the solution is minimal and correct. A couple of minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
LGTM - clean, well-scoped fix. One minor observation inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix. The root cause is well-identified (ORT Level 2 optimizer fusing QDQ patterns breaks QNN/DML EP compatibility). A few minor observations below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution is minimal and correct. The test updates properly reflect the new behavior. A few minor observations inline.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall LGTM - clean, well-scoped fix. One minor suggestion below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix overall. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is well-explained and the change is minimal and correct. Left a couple of inline suggestions below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix with clear root cause analysis. The change correctly prevents ORT Level 2 optimization from fusing QDQ patterns. One minor nit on a stale log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Review Summary: Clean, well-scoped fix. The root cause (ORT L2 optimizer fusing QDQ patterns) is clearly explained, the fix is minimal and correct, and test assertions are properly updated from �ssert_called_once() to �ssert_not_called(). The zero-initialization of �nalyze_iters, analyze_unsupported, analyze_details is appropriate since no analysis runs. current_path retains its prior value which is correct for pass-through. LGTM — no issues found.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Good fix! The root cause is well-identified and the approach is correct - completely skipping ORT optimization preserves QDQ patterns. A couple of minor nits below.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Overall this is a clean fix. The root cause is clearly identified and the solution is correct - pre-quantized models should bypass ORT Level 2 optimization entirely to preserve QDQ graph structure. Tests are properly updated. One minor nit on the log message.
DingmaomaoBJTU
left a comment
There was a problem hiding this comment.
Clean, well-motivated fix. The root cause (ORT Level 2 optimizer fusing QDQ patterns) is clearly identified and the solution — skipping the optimizer entirely rather than running it with special flags — is the correct approach for preserving QDQ structure that downstream EPs require. Tests properly updated. One minor observation: the \ est_skip_optimize_kwarg\ test at line 428 also changes to \�ssert_not_called()\ — confirming that the \skip_optimize\ kwarg path now also fully skips (consistent behavior). LGTM.
f83aa5c to
cebbf4d
Compare
1c2dee5 to
51e4841
Compare
xieofxie
left a comment
There was a problem hiding this comment.
Nice catch on the QDQ-skips-optimize issue and the -m model.onnx build-config bug. Two concerns worth addressing before merge.
1. skip_optimize=True no longer actually skips optimization. run_optimize_analyze_loop always calls optimize_onnx internally (see build/common.py:83), so the new control flow appends "optimize" to stages_skipped but still runs the full optimize+autoconf+analyze path. The log messages and stages_skipped then disagree with reality, and test_skip_optimize_kwarg (both test_hf.py and test_onnx.py, lines around 860 / 413) literally encodes the contradiction:
assert "optimize" in result.stages_skipped # claims skip
mock_pipeline["optimize"].assert_called_once() # but it ranThose two assertions can't both be true in a sane reading of "skip_optimize". They were untouched by this PR, but the underlying behavior changed around them — please reconcile (assert_not_called() if the intent is really skip, or rename the flag if the intent is "label as skipped while still running").
2. Silent semantic regression for direct skip_optimize=True callers. Before this PR, skip_optimize=True passed no max_optim_iterations, so the autoconf re-optimization loop didn't run (max_optim_iterations=0 default). After this PR, max_optim_iterations=hack_max_optim_iterations is passed unconditionally, so skip_optimize=True now triggers a full autoconf loop where it used to skip it. Not mentioned in the PR description.
Individual suggestions on the lines below. The pre-quantized fix and the commands/build.py fix both look correct in intent — just want the skip_optimize semantics tightened up so labels match work, and the is_quantized_onnx calls de-duplicated.
bdee3ff to
71eacb7
Compare
ORT Level 2 optimization fuses QDQ patterns (e.g. DQ->Conv->Q into QLinearConv), breaking QNN/DML EP compatibility. Pre-quantized models passed to build were being optimized despite the log saying 'skipping optimize', causing ~30x latency regression (500+ms vs 17ms) due to CPU fallback. Now build_onnx_model truly skips the optimize stage for pre-quantized models, preserving the original QDQ graph structure.
71eacb7 to
d7b1c38
Compare
Address reviewer feedback: - build/hf.py: skip optimize_onnx for pre-quantized models (same fix as build/onnx.py, prevents QDQ-to-QLinear fusion that breaks EP compat) - commands/build.py: detect ONNX file input and route to generate_onnx_build_config via onnx_path param, fixing 'not a valid JSON file' error when running winml build -m model.onnx without -c - Update test_hf.py assertions to match new skip behavior
13711d6 to
bb0c2fd
Compare
xieofxie
left a comment
There was a problem hiding this comment.
Two blocking concerns before merge — the implementation looks plausible, but the PR description argues for the opposite behavior and rests on a contradicted technical claim.
| # Skip optimize entirely for pre-quantized models. ORT Level 2 | ||
| # optimization fuses QDQ patterns (e.g. DQ→Conv→Q → QLinearConv), | ||
| # which breaks QNN/DML EP compatibility and causes CPU fallback. | ||
| analyze_iters, analyze_unsupported = 0, 0 | ||
| analyze_details: dict[str, Any] = {} |
There was a problem hiding this comment.
The PR description contradicts this code. The title/body say pre-quantized models now run the full optimize+autoconf+analyze loop and only skip quantize, and that "our custom optimizer is safe for QDQ models." This code does the opposite — it removes run_optimize_analyze_loop and skips optimize entirely (the updated tests flip optimize.assert_called_once() → assert_not_called()).
None of the "Changes" bullets match the diff: there is no is_qdq variable (still is_pre_quantized), no log message is changed, and skip_optimize now skips optimize rather than "restoring original semantics — runs optimize_onnx." Please rewrite the description to match the implementation before merge so the commit history / bisect aren't misleading.
| ) | ||
| ) | ||
| # Skip optimize entirely for pre-quantized models. ORT Level 2 | ||
| # optimization fuses QDQ patterns (e.g. DQ→Conv→Q → QLinearConv), |
There was a problem hiding this comment.
Unresolved technical premise. This comment says ORT Level 2 fuses QDQ→QLinear (so optimize must be skipped), but the PR description claims the optimizer is safe and does not fuse QDQ. These can't both be true, and the PR's correctness hinges on which is.
Note run_optimize_analyze_loop calls optimize_onnx(model, output, **config.optim) unconditionally (common.py:83), so whether fusion happens depends on the graph-optimization level config.optim applies to a QDQ model. Please confirm empirically — does optimize actually emit QLinearConv/QLinearMatMul on a sample QDQ model? — and state the verified reason here.
| @@ -160,17 +160,11 @@ def build_onnx_model( | |||
| "Skipping optimize + quantize, running analyze-only." | |||
There was a problem hiding this comment.
Stale/misleading log message. This still logs "...running analyze-only," but after this change the analyze loop is gone — analyze_iters/analyze_unsupported are hardcoded to 0 and no analysis runs. The message now actively misdescribes the behavior (and ironically the PR description claims to fix misleading logs, yet no log line is touched in the diff). Suggest updating to something like "Pre-quantized model: skipping optimize + quantize to preserve QDQ structure." Same applies to the matching line in hf.py.
Summary
Pre-quantized (QDQ) models were incorrectly skipping
optimize_onnx. Our custom optimizer is safe for QDQ models — it removes redundant Q/DQ identity pairs without fusing QDQ into QLinear* ops. This PR changes the behavior so pre-quantized models run the full optimize+autoconf+analyze loop and only skip quantize.Also fixes a bug where
winml build -m model.onnx(without-c) would fail with "not a valid JSON file" becausegenerate_build_configwasn't receiving theonnx_pathparameter.Changes
build/onnx.py&build/hf.py:is_quantized_onnx()result to a localis_qdqvariable to avoid redundant ONNX file reads.skip_optimize: restore original semantics — runsoptimize_onnx(appliesconfig.optimflags) but skips the autoconf re-optimization loop. Stage labels now accurately reflect actual work done.skip_optimizeflag).commands/build.py: Passonnx_pathtogenerate_build_configwhen input is a.onnxfile path.