AI Native – traceyang

A PvE Extraction-Shooter Prototype: An Empirical Test of an AI-Assisted Development Pipeline

June 10, 2026 by trace yang

Part Two of the series · Opening of the practical track. Continues from the overview piece, AI-Native Game Development — From Asset Reuse to Experience Reuse, whose methodology now enters its empirical phase within a real project.

A question that must be answered first

The overview piece systematically surveyed “the stages of game development at which AI can play a role” — retrieving references, translating designs, generating code, tuning values, executing verification, and integrating assets. The argument was complete.

Yet once the argument concluded, one question remained unresolved: Is this methodology merely a notion confined to the demonstration level, or a pipeline that can genuinely be put into operation?

The value of theoretical reasoning is limited. Accordingly, from this piece onward, this “AI-assisted development pipeline” is run once, in full, within a real project, in order to test whether it holds.

The project is defined as follows: atop an existing framework that has already been upgraded to UE5.7 and in which most systems are in place, build a third-person shooter prototype of the extraction-shooter genre, whose combat and orchestration experience is benchmarked against a benchmark PvE shooter. The cycle is three months, led by the developer, with AI assisting.

There is a premise here that must be made explicit — the prototype itself is merely the vehicle. What truly undergoes examination is the AI-assisted development pipeline. This judgment is the central axis of the entire series, and the text below will return to it repeatedly.

Why choose the extraction-shooter genre, and why benchmark against this title

The core of the extraction-shooter genre is a closed loop: insert into the field → scavenge for resources → complete the objective → extract under pressure. Its appeal lies not in any single dazzling system, but in the continuous trade-off between “press deeper or extract safely,” and in the tension formed by that “last stand” during the extraction phase.

The benchmark title, taken as the reference object, pushes the experience above to its limit: the sense of oppression formed by large numbers of enemies closing in on the same screen; the satisfying feedback of calling in support from above via a command sequence; the heavy — rather than nimble — feel of heavy-equipment movement; the shooting feedback in which every hit carries weight; and one critical premise — it is a game of four-player cooperation, in which solo play is difficult to sustain under its mechanics.

What this project sets out to reproduce is this “felt experience,” not a verbatim copy of its system implementation. This delineation determines what AI should, and should not, undertake in this project.

The first task before any hands-on work: complete a systematic inventory

Many people’s notion of “making a game with AI” is to open the editor and immediately have AI generate. This project’s first step is the opposite — first have AI thoroughly survey this framework and clarify which systems are already in place, which need to be supplemented, and which must be built from scratch.

This is precisely one of the most underrated of AI’s capabilities: code archaeology. The conclusions of the inventory make the overall landscape clear at a glance:

Largely in place (a matter of adjusting configuration): weapon damage (data-table-driven), scavenging and pickup, extraction and insertion, basic character controls
Partially in place (a key link to be supplemented): the camera (first-person is complete; the first ↔ third-person switch is missing), enemy AI (behavior trees and crowd management are complete; enemy waves are missing)
Accumulated but not yet connected: the asset pipeline for the target source material (the toolchains for unpacking, material reverse-engineering, and level reverse-engineering are all mature, but not yet integrated into the framework)

Only three things genuinely need to be built from scratch: the first ↔ third-person view switch, the stratagem call-down system, and the enemy-wave system. Of these, the stratagem call-down is the only entirely new gameplay system — the rest are reuse, tuning, and asset integration.

This judgment established the priorities of the entire plan: effort is not invested in reinventing the wheel, but is concentrated on these three items and on “tuning out an accurate experience.”

🔧 Design Retrospective · Why inventory first, then act
Why this design: The greatest risk of starting work without an inventory is duplicated construction — spending two weeks implementing a feature that is already ninety percent complete within the framework. By first having AI clarify the present state, effort can be directed toward the genuinely empty spaces.
Where it bit us: An inventory is not a matter of having AI “read through the code and offer a problem-free conclusion.” It must produce a present-state list that can be grounded in specific call chains and specific data-table fields; otherwise it amounts to having taken no inventory at all.
How the traditional approach falls short: For a person to read through hundreds of thousands of lines of unfamiliar code in order to clarify a framework is a workload measured in weeks. Code archaeology compresses this to a workload measured in days — and this is the precondition for fitting the entire plan within a three-month cycle.

A judgment derived from the inventory: multiplayer is not a burden, but an existing dividend

One finding from the inventory directly revised the workload estimate.

The core of the extraction-shooter genre is cooperation. A common expectation is that, within a three-month solo prototype cycle, implementing multiplayer would cause the workload to spiral out of control: state synchronization, property replication, multi-client consistency — these are typically the difficulties that consume a development cycle.

A survey of the framework’s network layer led to a different conclusion: this is a production-grade multiplayer shooter framework. Within a single core module alone there are nearly two hundred network-replication declarations; character movement, properties, and state machines are all designed around the replication mechanism.

This means the positioning of multiplayer shifts from “whether to do it” into an entirely different proposition: it is not “building the network layer,” but “reusing the existing network layer, while avoiding implementing new systems as single-player logic that is not multiplayer-ready.”

For this reason, multiplayer was not set up as a separate “multiplayer milestone” — that would lead to multiplayer being integrated last, with all earlier systems implemented in a single-player manner and then reworked. It is instead set up as a cross-cutting principle: from the very first milestone, all new systems are designed in a multiplayer-ready manner, with state following the existing replication paths. Verification is carried out directly in the form of multi-player cooperation.

So what kind of constraint, specifically, is “designed in a multiplayer-ready manner”? It is not abstract; grounded in code, it amounts to several explicit guidelines.

Consider one of the most error-prone counter-examples. In the stratagem call-down, the player inputs a command sequence, throws a beacon, and seconds later an orbital strike descends from above. The not-multiplayer-ready implementation is: the client detects the beacon’s landing locally, spawns the orbital strike locally, and resolves the damage locally — it runs smoothly in a single-player environment, but breaks the moment multiplayer is introduced: each client computes on its own, and the landing point, the damage, and even “whether that support arrived at all” as seen by teammates may all be inconsistent. The multiplayer-ready implementation is: the beacon throw is treated as a request; the server determines the landing point and timing, the server spawns the support entity, the server resolves the damage, and then the results are replicated to all clients. The presentation layer (projectile trajectory, screen shake, sound effects) may be rendered locally, but any determination that affects game state must be vested in a single authoritative end.

The same guideline runs through every system: who determines an enemy’s health and death, who advances mission progress, whose clock governs the extraction countdown — the answer is always “server-authoritative, client-presentational.” This is not a profound architecture, but rather a default habit at coding time: for each newly added field that will alter game state, first confirm “whether it needs to be replicated, and who modifies it.”

This is precisely where AI assistance can play the greatest role, and also where it is most prone to error. Where it plays a role: AI is able to understand the usage of the framework’s existing replication macros and, following the established paradigm, wire replication into the state fields of new systems. Where it errs: absent human oversight, AI-generated code will tend toward single-player logic of “modify locally and it just runs” — because within the tendencies formed by its training, the single-player implementation is shorter and feels more “self-consistent.” For this reason, along the multiplayer track, the human’s responsibility is to continually ask “is this state consistent in a multiplayer environment.”

This is treated separately because it points to a more general judgment — in AI-assisted development, what is most irreplaceable about the human is often not writing code, but holding those systematic constraints that lie outside AI’s field of view. AI excels at giving a swift and accurate solution within a delimited local scope; but constraints such as “this code must hold in a multiplayer environment” and “this field will be observed by four clients simultaneously” are not within its current context, and are therefore difficult for it to take into account. Multiplayer is merely the most typical example of such a constraint. Across the entire prototype, the real center of gravity of the work lies not in “getting AI to produce code,” but in “guarding, on AI’s behalf, the boundaries it cannot see.” This is precisely one of the core matters the series is to validate: whether this division of labor — the human holds the constraints, AI produces the solutions — can operate stably.

🔧 Design Retrospective · Why multiplayer is a principle rather than a milestone
Why this design: To place multiplayer as a standalone phase at the end is tantamount to tacitly permitting all earlier systems to be implemented in a single-player manner first and then retrofitted — the arrangement with the highest rework cost.
Where it bit us: The initial framing nearly handled it as a standalone “to be revisited later” module. The true risk lies not in the network layer (which is already in place), but in whether new systems get casually implemented as non-replicable single-player logic. Bringing it forward as a principle is precisely what averts such rework at the source.
How the traditional approach falls short: The traditional practice often treats multiplayer enablement as a dedicated “retrofit project.” When the underlying network layer is already mature, the more economical approach is to let it recede into invisibility — folded into the design constraints of every system, rather than advanced as a separate initiative.

The route: first connect what is “playable,” then enrich what is “replayable”

The order of progression across the thirteen weeks was designed, not arranged in numerical sequence.

Stage One · Initiation + three preliminary validations (W0-1). Confirm that the framework can complete a full-project build and can launch the editor; at the same time, conduct preliminary validation on the three least certain pipelines — assets (can they be unpacked and integrated), network (can multiplayer stand up a server and run), and mission/HUD (the degree to which they are already in place). Expose the high-risk items in the very first week, rather than dragging them to the end only to discover unworkability.

Stage Two · View overhaul (W1-3, first deliverable). Overhaul the first-person view into a third-person view. This is precisely the first empirical test of the overview piece’s argument about “reworking an FPS into a TPS.”

Stage Three · Movement and gunplay feel (W4-5). The heavy feel of heavy-equipment movement, and solid shooting feedback.

Stage Four · Stratagem call-down (W6-7, built from scratch). Calling in support from above via a command sequence — this is the most iconic gameplay of the benchmark title, and also the only gameplay system in this project built from scratch. It is relatively self-contained and does not depend on enemies, so it is deliberately brought forward ahead of the enemies, in order to validate the complete chain of “input → beacon → descent from above” in an empty-field environment.

It warrants noting separately: this is the only genuinely 0-to-1 new system in the entire prototype — the remaining milestones either rework systems already in place or make increments on an existing architecture; this one alone begins from a blank sheet. For this very reason, it will become the litmus test of whether “AI-assisted from 0 to 1” holds. In the aforementioned “reuse and tuning” links, AI’s value is relatively predictable; but faced with a system for which there is no precedent within the framework, can AI provide support all the way from design translation through to code generation — this is the project’s greatest unknown, and the key indicator to observe in this phase is how much support AI can actually provide in a situation where “there is no ready paradigm to follow.”

Stage Five · Monsters (W8-9, built from scratch). The readable threat of a single enemy, and the swarming pressure of the crowd.

Stage Six · Loop + missions + multiplayer (W10-11). Integrate the components into one complete, playable round of the extraction-shooter loop. This stage juxtaposes three items because they are inherently three inseparable facets of “one round”: the loop skeleton, the mission system, and multi-player cooperation.

Stage Seven · Replayability polish + asset integration (W11-13). Tune out the driving force of “one more round,” and integrate the assets of the benchmark title in full. Meanwhile, UI/HUD does not belong to any single week — it runs throughout, supplemented progressively alongside each system: absent clear conveyance of information, however complete a system may be, the player has no way to perceive it.

Of the three items in Stage Six, the one most worth treating separately is the mission system

Loop, missions, and multiplayer are juxtaposed in the same stage not because they are trivial enough to bundle together — quite the opposite: they are three indispensable facets of “one round.” Among them, the mission system is the one most easily regarded as an “ancillary feature,” yet the one that least ought to be treated as such.

First, a counter-intuitive fact: a superficially complete extraction-shooter loop may be entirely idling.

Insert, scavenge, engage, extract — chained together, these four steps do allow the player to walk through from beginning to end. But absent “what the objective of this round is,” it amounts to a machine running in place: the player invests a great deal of action, yet has no reason whatsoever to venture into the deeper, more dangerous areas of the map. A rational player will arrive at an optimal strategy that dulls the experience — land, scavenge one round nearby, and immediately extract. The risk is lowest, the reward acceptable, and the loop still closes. But the round is devoid of tension, because no mechanism prompts the player to take on risk.

The mission system is precisely the mechanism that prompts the player to take on risk. It sets a primary objective for the round — destroy a facility, collect a sample, upload a piece of data — and that objective happens to lie deep in the map, and happens to take time. Once it is introduced, the player’s decisions acquire substance: is it worth venturing into danger to complete the primary objective? Is the extra reward offered by a side objective worth lingering two more minutes and enduring one more enemy wave?

More critical still is the way it couples with extraction. Extraction should not be an escape route available at any time, but should be constrained by the completion of the primary objective. To extract without completing the primary objective counts the round as only half-achieved; only after completing the primary objective is the extraction point truly activated, or are the extraction conditions improved. This coupling welds “extraction shooting” from three independent actions into a complete narrative with an initial motive, a climax of process, and a cost — the player enters with a mission, completes it amid the enemy waves, and then extracts amid even fiercer waves. The tension of the “last stand” springs precisely from this: what the player defends is not only the resources scavenged, but the objective hard-won in this round.

Once it is established as the driving core, the implementation path becomes clear instead. In the prototype stage there is no plan to stack a large number of mission types — one or two are appropriate for validating the experience, for example “destroy a facility” and “collect a sample.” But the missions are implemented as data-driven: objective type, objective location, completion conditions, and the coupling relationship with extraction are all abstracted into configuration, rather than hard-coded into a particular level. In this way, once the experience has been validated, extending to more mission types is merely a matter of configuration, requiring no return to modify the core loop.

This line of thinking — “validate the experience first, then scale by configuration” — in fact runs through the entire plan; stratagem varieties, enemy types, and the wave curve all follow this path. Behind it lies a plain judgment: what the prototype stage must validate is “whether this experience holds,” not “whether the content is sufficient.” Content is the part worth scaling only after the experience holds. AI’s role here is to assist in designing well the configuration structures of these systems and in populating the first batch of data; the final judgment of “whether this experience holds” still belongs to the developer.

This is precisely why it is called the driving core rather than a feature. A feature is a component that can be appended after the fact; the driving core determines the direction in which the entire system exerts its force.

🔧 Design Retrospective · The mission system: a driving core re-extracted from the “feature list”
Why this design: The loop skeleton (insert → scavenge → engage → extract) appears complete, yet lacks the link of “why enter this round.” Without an in-round objective, the extraction-shooter loop degrades into a mere mob-grinding arena. The mission system is what drives the player deep into the map and forms the tension of “complete the objective, then extract.”
Where it bit us: It did not exist in the initial plan. Reviewing the loop in hindsight exposed the problem — it was a loop lacking an initial motive. It should not be an ancillary feature of some milestone, but should couple directly with the “extraction conditions.”
How the traditional approach falls short: Treat the mission as “one more feature,” and the loop is a machine idling in place. Treat it as the driving core, and the loop acquires direction.

The kernel: what AI is delineated to undertake in this pipeline

This is what truly undergoes examination. But before discussing what AI undertakes, one must first lay out a way of decomposition that runs through the entire process — it determines where AI’s leverage actually lands.

Every experience goal is broken down into two parts: function and values. Function is the skeleton: fire can land a hit, a beacon can summon, enemies can advance in swarms — it resolves “whether it exists.” Values are the feel: how strong the recoil is, how long the TTK runs, along what curve the waves ramp up — they resolve “whether it is right.” Once the function is in place, what truly decides whether the experience holds together lies almost entirely on the values side.

The key realization: the value tuning here is experience-oriented in nature, not engineering-oriented. Shifting the recoil curve from A to B looks like changing a single config entry, but in substance it answers “does this gun feel solid enough to fire”; raising wave density by thirty percent looks like changing a spawn parameter, but in substance it answers “is the suffocating sense of being overwhelmed strong enough.” Values are not cold configuration — they are the experience itself, in its quantified form. Precisely for this reason, the act of tuning values cannot be handed off to AI entirely — because “how far to tune before it’s right” is an experiential judgment.

This way of decomposition is exactly the shared premise of the three divisions of labor below: on the “function” side AI can take over a great deal of the work, while on the “values” side it can only assist in closing the gap, with the final call left to the human.

🔧 Design Retrospective · Why “values are experience,” not “values are configuration”
Why this design: Decomposing the experience into function and values is meant to bring the elusive question of “does the feel hold up” down onto operable objects. The values side is singled out for emphasis because it is the most easily mistaken for a purely engineering parameter — and once that happens, tuning degrades into “settling on a value that does not throw an error,” and the experience falls out of the picture entirely.
Where it bites: The danger of values is that they look too much like ordinary configuration, and are therefore all too easily brushed off by “filling in a plausible value in passing.” But behind TTK, spread, and the wave curve stands the player’s actual perception; fill in one wrong value, and the skeleton still runs while the experience has already collapsed — and that collapse throws no error, it can only be sensed by a human.
How the traditional approach falls short: The traditional division of labor often severs “values design” from “engineering implementation,” leaving values reduced to isolated entries in a spreadsheet. Only by explicitly anchoring values as “the quantified form of the experience” does tuning gain a basis for judgment — what it measures against is not some technical metric, but a concrete experience goal.

First, reference alignment comes first. For the “feel” links such as view, movement, and shooting: the developer examines the benchmark and determines the direction, and AI is responsible for distilling hard-to-articulate sensations such as “heavy” and “solid” into a tunable parameter baseline table — arm length, shoulder offset, recoil curve, acceleration — and then tuning against the baseline. AI’s value here is to convert subjective feel into objective, tunable values.

Second, machine collects data, human evaluates. For the “verification” links such as enemies, oppression, and balance: the machine is responsible for running, instrumenting and collecting data, and organizing the data into curves — whether the reaction window is sufficient, how long one stands before being killed, where the pressure peak occurs. But the authority over the judgment of “whether the experience is accurate” always belongs to the developer. AI does not score.

This point warrants emphasis, because it is the most prone to being executed with a drift. One common notion is: simply have AI watch a recording and score it automatically. This project deliberately does not adopt this approach. “Whether this enemy carries a sense of threat” and “whether this wave’s sense of oppression is accurate” are experience judgments, not data judgments. What AI is tasked with is to convert a fuzzy experience into objective data the developer can review, so that the judgment has a basis; but the final decision is made by the human.

This holds the central axis of the entire series: AI assists, the human decides. The verification link is no exception.

Third, when building from scratch, emphasis falls on function and presentation. For an entirely new system such as the stratagem call-down: the emphasis is on the system being functionally complete and the call-down carrying satisfying feedback, with AI’s code generation receding to an auxiliary position — it assists in swiftly converting design into code, but “what form this system should take” is determined by the developer.

🔧 Design Retrospective · Is “AI does not score” a regression or a step forward
Why this design: Handing the scoring authority to AI appears more automated and more advanced. But there is no objective standard answer to whether an experience is good or bad; having AI score is no different from handing a question that has no standard answer to a respondent that will fabricate an answer in all earnestness.
Where it bit us: A more aggressive idea would be to have AI “watch the replay and assess the emotion curve.” The conclusion is that it can reason from the exported data series, but “watching the entire video stream on its own and scoring the experience” is neither feasible nor proper. Rather than package it as fully automated, it is better to draw the boundary honestly.
How the traditional approach falls short: The traditional practice is to invest manpower in conducting surveys and focus groups to measure the experience. AI does not replace this judgment, but it automates the laborious link of “collecting data and organizing curves,” so that the human’s judgment comes faster and with more basis.

Points not yet certain (listed in advance)

As is the custom, the matters that remain uncertain at present are listed:

Whether the framework can complete a full-project build, and whether it can launch the editor — at present only some modules are confirmed to compile, and “full-project operability” has not yet been tested in practice. This is the foremost matter to resolve in Stage One, and may also constitute an obstacle in the very first week.
How effective the “parameter baseline table” distilled by AI actually is — this is the crux of the entire “reference alignment” line of thinking. If the baseline deviates from reality, the value of this pipeline must be discounted.
Whether the “replicable design” of multiplayer harbors hidden hazards throughout — the network layer being in place does not equate to new systems remaining consistent once integrated. Multi-client state consistency must be confirmed through actual operation.
The numeric values in the acceptance criteria (the hit-time interval, the duration of a single round) are all placeholder values at present, and their true values will only be known once the actual output exists.

These uncertainties are precisely the reason this series is worth recording — if everything were already settled, there would be no need for validation.

What follows

This piece is the project initiation note. From the next piece onward, it will progress with the development schedule: practice first, then record. Completion will be recorded faithfully as completion; obstruction will be recorded faithfully as obstruction, including the links at which AI failed to provide effective help.

What this series sets out to validate has never been “the ceiling of AI’s capability,” but rather — whether a single developer, with the aid of AI, can advance a prototype from an existing framework to a playable state.

This answer remains undetermined; the process of validation is itself the value of this series.

Whether the judgments established in the overview piece — that AI can retrieve references, translate designs, and tune values, while holding the bottom line of “the human decides” — hold true, three months from now, this prototype will provide the answer.

(This article is a project initiation note for the series. All game names referenced are anonymized throughout. Subsequent practical pieces will be produced in step with the development schedule.)

AI-Native Game Dev in Practice · From Asset Reuse to Experience Reuse

June 8, 2026 by trace yang

In AI-Native Game Dev in Practice · The Full Gameplay System Map, I folded combat, enemies, and levels into a single through-line: experience goal → features + numbers (config-driven) → AI-assisted execution → validate and iterate. That piece kept hammering on one sentence — validate that it’s fun before you pour in assets.

But back then I waved off “art” with a light touch: during validation you only need a “good enough” placeholder — blockmen, gray models, temp effects. That’s true, yet it dodges a question that real production simply can’t get around:

Where does that “good enough” placeholder actually come from?

If you still have to build the placeholder from scratch, it isn’t “fast enough.” If the placeholder is too crude (a pile of graybox cubes), the experience you validate isn’t “real enough” — you’re validating the rhythm of a graybox, not the rhythm of the game. On the face of it these two demands contradict each other: it has to be fast, and it has to be real.

To put it even more bluntly: validation usually stalls not because you don’t know the method, but because you’re missing one piece of material that is both good enough and real enough. However elegant the methodology, with nothing to actually run, validation stays stuck in a slide deck.

This piece is about satisfying both demands at once. The answer isn’t “make new assets faster,” it’s reusing existing assets — taking the inventory assets you can already get your hands on (including other projects, other sources), reorganizing them with AI, and rapidly validating during the development and testing of gameplay; meanwhile, the iteration of production art assets runs in parallel.

This is the expansion of the “validation-phase art” cell from the Full System Map. On the surface it’s an article about art, but its anchor from start to finish is not “make the art beautiful” — it’s how art rapidly partners with design and engineering to get gameplay validated. Beautiful rendering, lighting and materials, are left to later chapters.

Asset Source Disclaimer · The third-party game assets referenced in this article serve only as demonstration samples for proving a method is feasible — they illustrate that the reuse-validation flow actually runs, and do not constitute asset deliverables for any real project. In a real project, such external assets serve only as temporary validation placeholders during development — once gameplay passes validation, the production-asset track replaces them with self-made assets. That is precisely the point of the “dual-track parallel” approach discussed in the second half of this article. Throughout, I refer to “a certain FPS,” “a certain open-world title,” and the like; I’m not targeting any specific work, the focus is on the method.

1. Why Do Asset Reuse at All

Let’s not talk about how yet. First, get clear on what, in the matter of gameplay validation, AI-driven asset reuse actually creates — because only once the value is clear do the methods and trade-offs that follow have any basis.

I split it into two groups: one is efficiency value, on how reuse makes validation faster and cheaper; the other is incremental value, on the qualitative shift reuse brings beyond efficiency — it lets you make things you fundamentally couldn’t have made otherwise.

Efficiency Value: Faster Validation, Deferred Investment

The first group of value has one through-line: cutting waste. It’s built from four steps that build on one another, each resting on the one before.

First, decoupling gameplay validation from art capacity. This is the foundation of the whole group. In traditional development, gameplay validation is held hostage by the art schedule — a designer wants to validate a new mechanic, but has to queue up and wait for art to deliver assets; until the assets are ready, validation can’t run. Reuse severs that dependency: inventory assets are available any time, validation can run any time, no longer waiting on anyone’s schedule. Once decoupled, the next three values become possible.

Second, preempting trial-and-error cost; deferring the investment decision. Since gameplay can now be validated any time, validation gets moved ahead of investment. The traditional flow is “build assets first → then validate gameplay” — the money is already spent before validation; and the moment validation fails, that up-front investment is sunk. The reuse flow inverts the order: validate gameplay with inventory first, invest in production assets only after it passes. Art money is spent only on directions that have already been validated. Investment shifts from an “up-front cost” to a “downstream decision.”

Third, high-fidelity validation. This one solves the “real enough” problem. With graybox cubes you can only validate whether the space connects and the logic runs; you can’t validate whether the rhythm is right, whether friend and foe read clearly, whether combat carries any emotion. What reuse pulls in are assets from other projects that are near-finished — real models, animation, effects, sound. The conclusions you draw from running it are naturally far more trustworthy than a graybox’s. This is exactly the key step that lifts the “good enough placeholder” from the overview piece from “graybox grade” to “near-finished grade.”

Fourth, reuse-validation and production assets on dual parallel tracks. Once the prototype stitched together from reuse already approaches the real thing, it’s no longer just a one-off validation tool — it becomes the starting point and reference for production-asset iteration. The gameplay track keeps validating on the prototype, while the art track gradually replaces and refines on that same prototype. The two tracks run at once, rather than one after the other in series. I’ll expand on this in section four; it’s where the whole piece lands.

Strung together, these four come down to one sentence: reuse lets you validate gameplay faster, cheaper, and more truthfully, and lets validation run in parallel with production. But if the value stopped here, reuse would still be just an “efficiency tool.” What really makes me see it as a new mode of production is the group below.

Incremental Value: The Qualitative Shift Beyond Efficiency

The through-line of the second group is no longer “cut waste” but “expand what’s possible.”

Value 1: cross-project reuse of existing assets. This is the biggest incremental gain in the article, and the one leap unique to the AI era.

In the traditional sense, “reuse” means digging through your own project’s old library. But once AI can semantically understand, retrieve, and rework assets from any source, the boundary of your material expands from “this project” to “all the inventory you can get” — other projects, other sources, all become your material library. The qualitative shift here isn’t about “saving,” it’s that you can stitch together new expressions that no single project could ever give you. That’s the leap from “cutting waste” to “creating new possibilities.”

Going further: once the material boundary opens up, inventory is no longer merely consumed — it begins to be re-produced. You don’t just take ready-made assets and use them, you can also generate new ones by referencing how they were made — that thread we’ll pick up in section three, but the seed is planted right here.

And it doesn’t stop at art assets. Once the material boundary opens, mechanics-layer resources can be called across sources too — level-layout rules, spawn configs, and AI behavior trees validated in other projects can likewise be pulled in and assembled. Reuse thus extends from “art asset reuse” into “whole-asset reuse.” This matters, and section three will cover data and logic assets specifically.

Value 2: parallel option validation & comparative selection. Because you don’t have to make new assets for every option, a single gameplay idea can be stitched into three different expressions at once and validated together. Trial-and-error thus shifts from “betting once in series” to “selecting the best in parallel.” The real core of this value isn’t “just try a few more,” it’s that it makes comparative validation possible for the first time — with a single option you can only ask “does this work,” with multiple options you can ask “which is better, and why.” Validation upgrades from “pass / reject” to “select-best + attribute-why.”

Value 3: validation data feeding back into asset-production decisions. The reuse process exposes one thing: which expressions are needed over and over, and which inventory never gets used. That in turn tells art — what production assets should be made first. For the first time, asset-production decisions have real evidence from the gameplay side, rather than the art director guessing from experience. This is the same philosophy as the overview piece’s “validate whether content is worth making first,” just extended to art-production decisions.

Value 4: style consistency safeguarding the validation signal-to-noise ratio. This one is easy to underrate. On the surface it’s “reuse from the same inventory and the style is naturally unified,” but what it really affects is the credibility of your validation conclusions. Imagine a validation prototype where ten assets carry ten different flavors — when you validate “the image is messy, readability is poor,” is that a gameplay problem, or an art-mismatch problem? You can’t say. Reuse from the same inventory locks the style variable, and only then is the validation conclusion’s signal-to-noise ratio high. So it’s actually a prerequisite for “high-fidelity validation” — when the style is inconsistent, the so-called fidelity is just distortion.

Value 5: inventory assets back-deriving production standards. In one sentence — AI makes implicit standards explicit: those naming and spec “unwritten rules” scattered in veterans’ heads, never put to paper, get distilled into an executable standard. It’s not a standalone value, it’s the execution bedrock that lets the earlier values truly land.

Beyond these, there are two secondary value threads worth a mention but not a deep dive: the compounding value of the asset library — every reuse and rework deposits new variants and new combination rules back into the library, so the more it’s used the richer it grows and the better it understands the project’s needs; and lower cross-functional alignment cost — hand people a near-finished prototype rather than a document or a pile of grayboxes, and everyone gets it instantly, discussing around the same “looks the part” thing instead of each filling in the gaps in their own head.

Boil the two groups down to one sentence: asset reuse isn’t a cost-saving tool, it’s a new mode of production — it frees gameplay validation from being choked by art capacity, and lets you make things you couldn’t have made before.

2. How to “Put Inventory to Appropriate Use”

The value is clear; next comes method. But there’s an easy place to veer off here, and I want to call it out first.

When people hear “asset reuse,” the first reaction is often “how do I find the stuff” — build indexes, add tags, do retrieval. These matter, of course, but they’re infrastructure, not the body of the workflow. If you treat “finding” as the trunk, what you’ll write is an article on “how to build an asset library,” not one on “how to validate gameplay with assets.”

The real center of gravity of the workflow is once you’ve found it, how to put it to appropriate use. “Being able to find it” is just the entry prerequisite; “using it appropriately” is where the craft lives. So in the workflow below, I compress “retrieval” into a one-line prerequisite, put the trunk on “use,” and tell it through one example that runs all the way through — validating the rhythm of an urban skirmish encounter.

Entry prerequisite: assume you already have an inventory library that can be searched by intent (semantic annotation and indexing are its infrastructure, not covered here). What follows focuses on: once you’ve found it, how to use it.

Step 1: Pin Down the Validation Intent

The first step of the workflow isn’t to dig through the library, it’s to ask yourself: what exactly am I validating this time?

Whether you use something appropriately depends entirely on the validation goal. First anchor “what gameplay or experience am I validating this time,” then back out from it “to validate this, what kind of expression do I need, and to what ‘good enough’ degree.” This step is a judgment AI can’t replace, but once you’ve set the goal, AI can help you list out the expressions you’ll need.

Example · I want to validate whether the spatial rhythm of an “urban skirmish encounter” is right. Backing out from that, the expression I need is: a city block with cover, with verticality, with line-of-sight occlusion. Note the “good enough” bar — it doesn’t need to be beautiful, but the spatial relationships must be real and readable, otherwise the rhythm you validate doesn’t count.

Step 2: Match + Rework as Stand-In

Only with intent in hand do you go into the library for material. There’s a fundamental shift worth naming here: traditional reuse is a human flipping through the asset library one by one; AI-era reuse is intent-driven recall — you say “I want a city block with cover and line-of-sight occlusion,” and AI pulls back the near matches. But even so, the point of this step is still not “finding it,” it’s “reworking it appropriately enough to validate.“

Retrieve and recall near-match assets from inventory, let a human judge whether they’re good enough; then do surface-attribute redefinition — adjust materials, textures, color, specs, to make it “fit the flavor.” There’s one boundary you must hold here: only change the surface, don’t rebuild the mesh. Rebuilding the geometry isn’t reuse anymore, that’s making new. In this step AI handles recalling candidates and batch-generating parameterized variants; the human picks and sets the style baseline.

There’s another case: inventory has nothing close enough. Here, the external asset has a second use — it’s not just material to use directly, it’s also a “method reference.” You can reference its specs, its stylistic thinking, its organization, and use AIGC to generate the exact kind of asset you actually need. This isn’t replicating a specific asset (that’s the copyright red line), it’s learning “how it’s done” — turning a validated method into a new batch of your own material. When inventory turns up nothing right, the reuse flow doesn’t break off here; it extends naturally from “using the ready-made” to “generating in the manner of the ready-made.”

Example · Take a city-block scene from a certain open-world title. The rework-as-stand-in does this: adjust material style and color grade to fit this project’s tone; swap the skybox, tune the lighting mood. The geometry isn’t touched at all — because the block’s spatial structure is exactly the core part I’m reusing.

Step 3: Assemble into a Validation Scene by Rules

A single asset reworked still has to be stitched into something runnable, reproducible, and reparameterizable.

Trim, splice, and cut by rules to assemble the material into a complete, playable scene. Here we answer a question planted in section one: if you “only change the surface, don’t touch the geometry,” what happens when you genuinely need new geometry? The answer is in this step — when you need new geometry, generate it via PCG procedural generation, or fill it in with a basic geometry proxy. Fine modeling is left to the production-asset track; the validation phase only needs “good enough” geometry. AI instantiates and assembles by config and fills geometry with PCG; the human sets the rules and recipes, and doesn’t place things by hand.

Example · Cut out the one block I need, delete the irrelevant buildings; per the encounter’s needs, fill in cover with PCG; then reuse a spawn config from a certain project to lay down spawn points and patrol routes. Note that last step — even a mechanics-layer resource like the “spawn config” is reused across projects. With that, I’ve stitched together an encounter you can actually fight.

Step 4: Validate → Decide → Iterate

The point of using is to validate, and the result of validation drives how you use it next round.

Run it, and check whether rhythm, readability, feel, and emotion hit the goal. Pass — this direction is worth investing production assets in, and the reuse prototype becomes the blueprint for the production track; fail — swap material, swap combinations, reassemble and re-validate, and because it’s reuse, the cost of this re-stitch is tiny.

Example · Run an encounter and check the urban rhythm. If the rhythm feels empty, shrink the block, add cover, reparameterize, reassemble, re-validate; if the rhythm is right, this layout becomes the blueprint for the production scene track.

The same workflow applies to animation: take a set of attack animations from a certain action game, retarget them to this project’s character, tune the rhythm, tune the wind-up, and validate the feel of a melee combo — if it passes, this animation set serves as the reference for the production animation. Models give “form,” animation gives “soul”; reuse without animation can only validate half.

The urban skirmish is just one sample. This same flow — “pin down intent → rework as stand-in → assemble → validate” — also applies to validating the attack-defense rhythm of a boss fight, the exploration routes of an open world, the layout of an extraction point — all that changes is the validation goal and the material drawn upon; the flow itself stays the same. That’s exactly its generality as a methodology tool.

The Difference from the Traditional Flow Is Where the Investment Decision Sits

Put this workflow side by side with the traditional flow and the difference is one sentence: art investment shifts from an “up-front cost” to a “downstream decision.”

The traditional flow is “greenlight → art deliverable (up-front cost) → integrate → validate gameplay.” Art becomes the bottleneck of validation, gameplay has to wait for assets to be finished before it can run; and the money is spent before validation, so when validation fails, the up-front investment is sunk. The reuse flow is “retrieve inventory → rework and assemble → validate gameplay → invest in production assets (downstream decision).” Gameplay doesn’t wait on art, validation can run any time; money is spent on validated items, and trial-and-error is nearly zero-cost.

This isn’t “AI helps you make assets faster,” it’s letting you know, before you spend the money, whether the money should be spent at all.

3. What’s Reusable Is More Than Art Assets

In the section two workflow, the urban skirmish example quietly planted a setup: in the assembly stage, what I reused wasn’t only the city block (an art asset), but also the spawn config (a mechanics asset). That wasn’t an offhand aside, it’s a deliberate claim of this article —

Reusable inventory isn’t just art assets; data and logic assets can likewise be called across projects.

This is exactly section one’s “Value 1” landing at the mechanics layer. Below I lay out reusable assets in two categories: art assets are the main body (bringing validation close to the real experience), and data and logic assets are the extension (carrying “cross-project reuse” all the way to the mechanics layer).

Art Assets: Bringing Validation Close to the Real Experience

Art assets are the main body of this piece, laid out across four dimensions — see / move / hear / effect — each tied to a validation goal.

Character / monster models carry the scale-and-proportion baseline, friend-foe readability, and the legibility of what you control. They’re the “visible” foundation of validation, and by reusing the same inventory, they lock the scale baseline and serve style consistency.
Character / monster animation carries feel and rhythm, the readability of the wind-up, and action feedback. This category is critical — models give “form,” animation gives “soul.” With only a static model, you can validate “does it look right” but not “does it feel good to hit.” And feel is exactly the anchor of the combat system in the overview piece, so animation is the key material for approaching the real experience.
Effects carry hit feedback, ability expression, and emotional value. A good share of the emotional peak is carried by effects — the “juice” a graybox can never give.
Sound carries hit feedback and emotion. Nearly half the juice lives in the sound, and reusing sound is extremely cheap and extremely high-return — it’s often overlooked, yet contributes enormously to “making the prototype feel like a game.”
Complete scene assets carry spatial rhythm, the road network, POI layout, and the exploration experience (directly echoing the level-design chapter). Their reuse is special: trimming, splitting, recombining belong to the “assemble by rules” step, and when new geometry is needed, go through PCG or a basic proxy.
UI / icons carry information delivery and readability. Reusing a ready-made UI kit makes a prototype “feel like a game” rather than a bare scene, and directly lowers cross-functional alignment cost.
Material / texture libraries are different from the above — they’re not finished material used directly, they’re the raw stock for “parameterized rework.” Swap the material = swap the style = stitch multiple variants from one stock; what it supports is parallel option validation.

Data and Logic Assets: Carrying Reuse to the Mechanics Layer

If reuse stopped at art, this article would just be “how to save art labor.” What truly makes it “whole-asset reuse” is the layer below — mechanics-layer resources can likewise be called across projects.

Level / data assets (level-layout data, spawn configs, numbers tables) support validating level rhythm, the difficulty curve, and balance ratios. Reusing a validated config from another project as a starting point beats starting from blank by a mile.
AI / behavior-logic assets (behavior trees, state machines, perception and decision configs) support validating enemy behavior, the engagement OODA, and threat sense (echoing the enemy-systems chapter). Reusing a mature behavior skeleton lets you quickly stand up a combat-ready enemy prototype.
Interactable logic skeletons (pickups, door switches, vehicle control, ability-cast skeletons) support the interaction loop of a gameplay prototype. Reusing ready-made logic skeletons makes prototypes stand up faster.

A boundary to draw here, to avoid clashing with the later engineering-focused chapters: this section is about reusing other people’s ready-made logic / data to validate quickly, while the engineering chapters are about “AI writing code for customization.” One borrows the ready-made, the other builds from scratch — different angles, no conflict. Reusing the ready-made ≠ writing new code.

There’s a further point hidden here. Once the asset boundary is opened, what’s truly reusable is often not the model, but the experience — the spawn ratios validated in other projects, the tuned behavior trees, the level-layout rules that have settled out — these “experiences” crystallize into data and logic assets that you call across projects. The model is the shell of the experience, and what you reuse is really the validated judgment inside the shell. This is the same face as Value 5’s “back-deriving production standards” — what reuse reuses is what’s left behind after someone else hit the potholes and made the mistakes.

Precisely because what’s reused is experience rather than shell, the workflow’s “reference the method, generate with AIGC” makes sense: what you read out of an external asset is a validated method, and AIGC lets you grow that method into your own material at scale. The production standards back-derived in Value 5 become exactly the yardstick constraining AIGC output here — making the generated assets conform to project specs from the start. And so the loop closes: learn the method from inventory, constrain generation with the standard, make the output reusable.

At bottom, the value of AIGC isn’t to replace the asset library, it’s to turn the experience inside the library into an infinitely expandable material space. Reuse lets you stand on experience others have validated; generation lets that experience no longer be limited by “whatever happens to be in the library” — reuse and generation are unified at the layer of “experience.”

Boil this section down to one sentence: art assets bring validation close to the real experience, while data and logic assets carry “cross-project reuse” all the way to the mechanics layer — only with all assets working together can you support one complete gameplay validation.

4. Dual-Track Parallel: How the Reuse Prototype Feeds Production Assets

By now the first three sections have covered “why reuse,” “how to use it,” and “what can be reused.” But one most critical question is still unanswered:

That validation prototype you stitched together from reuse — once validation is done, do you just throw it away?

If you throw it away once validation’s done, then reuse is still a one-off tool, of limited value. What this article really wants to say is that it shouldn’t be thrown away — the reuse prototype should become the starting point and reference for production-asset iteration. The reuse-validation track and the production-asset track are two parallel tracks, not one-after-the-other in series.

To say it again: external assets are merely proxy assets for validation during development, replaced by self-made assets once gameplay passes validation — they were never the final product, and that is precisely the meaning of “dual-track parallel.”

The Two Tracks

Reuse-validation track (gameplay side): inventory assets called across projects → retrieve / rework / assemble → the reuse prototype supports the development and testing of gameplay → arrive at a validation conclusion (if it passes, it’s worth investing; if not, swap options and reassemble). This track is the workflow from section two; it runs continuously across the whole development cycle, with gameplay iterating on top of it.

Production-asset track (art side): with the reuse prototype as blueprint (clear on “what it should become,” specs set per the back-derived standards) → self-made assets replace the external proxy assets one by one → geometry / material / effects are refined and polished, pushed toward finished quality → production assets take shape, and all proxy assets exit. Fine modeling, and polishing toward finished quality, all happen on this track.

The key is: these two tracks advance at the same time. The gameplay track keeps validating, the art track replaces and refines on the same prototype. Gameplay won’t stall because art isn’t ready, and art won’t go off and build in a vacuum detached from gameplay.

What Meshes the Two Tracks

Saying “parallel” isn’t enough; there has to be an explicit meshing mechanism between the two tracks, otherwise it’s just each doing its own thing. There are three interactions between them:

① Reuse prototype → starting point and reference for production assets. Production assets don’t start from zero, they replace and refine on top of the reuse prototype, step by step. The prototype already defines “what it should become,” and the production track refines against it rather than conceiving the final form out of thin air.

② Validation data → feeding back asset-production decisions. The reuse process exposes “which expressions are needed often,” which directly determines what production assets to make first. Asset production shifts from being called by gut to being driven by gameplay evidence (this is section one’s Value 3 landing in the dual-track setup).

③ Production standards → same spec on both tracks, seamless replacement. The production standards back-derived from inventory keep production assets at the same spec as the prototype. Only then can proxy assets be replaced precisely, without rework (this is the execution landing of section one’s Value 5). The premise for handing off and retiring proxy assets is “same spec.”

Put the three interactions together and the two tracks aren’t two parallel straight lines, they’re two gears meshing and turning forward together: the reuse track keeps validating gameplay, the production track replaces and refines on the prototype, locked in the middle by “feedback + standards.”

The One-Sentence Landing

This whole mechanism boils down to one sentence:

Gameplay doesn’t wait on art.
Art never strays from gameplay.

Gameplay validation needn’t wait for production art, because the reuse prototype holds it up; and production art always orbits validated gameplay, because it takes the reuse prototype as its blueprint. This is what dual-track parallel really solves — it lets “validate that it’s fun before you pour in assets,” the through-line of the overview piece, truly land in the art link.

One boundary to add: the reuse flow doesn’t rebuild meshes, fine modeling belongs to the production-asset track; when the validation phase needs new geometry, it’s realized via PCG procedural generation or a basic geometry proxy. This is the continuation of the same toolset as the level-design chapter’s “PCG directly generating terrain and placement.”

5. Boundaries and Where It Doesn’t Apply

Writing this far, it’s time for a few honest words — this method isn’t a cure-all, it has clear boundaries and scenarios where it doesn’t apply. Spelling these out is more valuable than hyping it as a silver bullet.

First, rework doesn’t touch geometry — this is an honest capability boundary. What AI changes in the reuse flow is surface attributes, not the shape of the model. Need new geometry, and you either go PCG, or use a basic proxy, or — kick it over to the production-asset track for fine modeling. I nail this boundary down so the word “reuse” doesn’t get inflated into “AI can change anything.”

Second, there are things reuse can’t help with. When a piece of gameplay or expression is brand-new, with no near match anywhere in inventory (say, an original mechanic no one has done, or a unique visual language), reuse can’t help, and you can only make it new. The premise of the reuse flow is “there’s a near match in inventory to rework.” Admitting this premise is what makes the method credible rather than hype.

Third, copyright and sourcing must be taken seriously. This is also why the article opens with that Asset Source Disclaimer. External assets serve only as validation placeholders during development, replaced by self-made assets once validation passes — this is both copyright self-restraint and exactly the heart of “dual-track parallel”: proxy assets are born to be replaced, they were never the final product. This logic is self-consistent, holding the boundary while fulfilling the intent.

Fourth, this piece stops at “gameplay prototype.” It doesn’t touch beautiful art expression, doesn’t touch rendering, lighting, or materials — those belong to the back end of the production-asset track, and to later dedicated chapters. The entire goal of this piece is to use the lowest art cost to get gameplay rapidly validated in a “visible, tangible” state.

In Closing

Boil this piece down to one sentence:

Art doesn’t have to be beautiful to be useful — reuse lets it start feeding gameplay validation at the “good enough” stage.

The overview piece said “validate that it’s fun before you pour in assets,” but it left one question unanswered: where the validation-phase “good enough placeholder” comes from, and how to make it both fast and real. The answer this piece gives is: reuse existing assets + AI reassembly. Reuse decouples gameplay validation from art capacity, defers the investment decision, and brings validation close to the real experience; it also expands the material boundary from this project to the whole industry’s inventory, turns option validation from “betting once in series” into “selecting the best in parallel,” and turns asset-production decisions from gut calls into data-driven ones; finally, through dual-track parallel, it makes the reuse prototype more than a validation throwaway — it becomes the starting point of production-asset iteration.

And the premise of all this hasn’t changed: AI executes; humans judge. What to reuse, what to rework it into, whether validation counts as passed — these are always the human’s judgment; what AI does is make the road “from inventory to validation” many times faster.

What this piece really wants to say is, in fact, a bit bigger than “art.” It starts from “reusing assets” and walks all the way to “reusing experience” — when you generate by referencing methods others have validated, and constrain with back-derived standards, what you reuse is no longer some model, it’s the judgment someone left behind after hitting the potholes. This road from assets to experience is where AI truly changes the mode of production.

This is the first stop of the practice series. Next, following that map from the overview piece, I’ll walk into combat, enemies, levels, the AI Director, and AI Coding one by one — this piece is about first walking the street of “from asset reuse to experience reuse” all the way through.

This piece lays out the method and the logic. In what follows, I’ll take a concrete case and run this flow end to end for you — starting from a pile of external inventory, how you retrieve, modify, and assemble it step by step to support a real round of gameplay validation, and then let it feed the production assets. With the method covered, it’s time to see how it actually plays out in practice.

At bottom, what AI changes has never been asset production itself, but making experience reusable at scale for the first time. Models, animation, and effects will date and be replaced; but a validated method, a tuned set of ratios, a rule distilled only after hitting the potholes — once these can be retrieved, referenced, and generated, their value is no longer trapped inside any single project.

And if I had to leave the one thing this piece most wants to say as the final line:

The most valuable asset of the future may not be a model.
It may be validated experience.

AI-Native Game Development in Practice · The Full Gameplay-System Map

June 7, 2026June 1, 2026 by trace yang

This isn’t an article about what AI *can do*.

There are already too many of those lists — AI can write code, generate textures, run simulations. But piling those capabilities together does not automatically produce a development method that actually ships. The real question is: when you want players to feel a certain way, how does AI help you turn “the experience you want” into “a playable, validated prototype”?

In my previous post, *AI-Native Game Development — From Pipeline Refactoring to Dynamic Experience*, I made an argument: what AI Native really changes isn’t AI-generated assets — it’s that the game industry, for the first time, is shifting from “a competition over asset-production capacity” to “a competition over experience-validation speed.” The most competitive team isn’t the one with the most art assets; it’s the one that can fastest validate *what’s fun*.

That post was about why. This one is about how.

If validation speed is the new battleground, the very next question is: when you want players to feel a certain way, how exactly does AI help you turn “the experience you want” into “a playable, validated prototype”? The previous post discussed combat, enemies, and levels as separate pipelines; this one pulls them into one unified framework — so that “from experience to validation” walks the same path in every system.

So this post is a map. For each system I’ll lay out how it starts from an experience goal, how AI assists in delivery, and how it gets validated — including why it’s done this way, and where the boundaries are. Later I’ll expand each system with concrete examples; this post first stands up the whole framework.

One thread running through everything

Whether it’s combat, enemies, or levels, the method is actually the same line:

Experience Goal → Function + Values (config-driven) → AI-Assisted Delivery → Validate & Iterate

These four steps look plain, but the order and the division of labor in each one matter.

Step one, the experience goal. First get clear on what you want players to feel — game feel, emotion, pacing, progression. This is the starting point, and it’s the judgment AI cannot make for you. Many teams skip this step and jump straight to building features, only to discover halfway through that it’s “not fun” — and they can’t articulate exactly what’s not fun, because they never defined what “fun” was supposed to be. An experience goal isn’t an empty phrase; it has to be translatable into something verifiable, or the next three steps have no target.

Step two, function + values. Code defines the skeleton; values define the expression. The key discipline here: iterate by tuning values, not by repeatedly changing code. Every code change carries the risk of introducing bugs, needs retesting, needs re-review; a value change is a hot reload that doesn’t touch logic, with near-zero risk. So the best reliability, it turns out, is to write less code and change less code — expose everything that might change as config at the moment you write the code.

Step three, AI-assisted delivery. AI finds references, translates, generates in bulk. Note the division of labor — humans decide, AI executes. AI doesn’t judge “whether this value is good”; it turns your judgment into reality, fast.

Step four, validate & iterate. Machine runs + human tests, compared against the experience goal. Below target — go back and fix. This forms a closed loop.

The single most important line on this thread: validate that it’s fun first, then invest in assets. The biggest waste in traditional development is the art taking three weeks before anyone realizes the gameplay direction was wrong — by which point the cost of tearing it down is enormous, so you’re forced to “bet on the direction.” But when gameplay can be validated quickly with placeholder assets + values, you discover the wrong direction the moment it appears, change a parameter, and no one gets hurt — the cost of rejecting a bad direction is near zero. That’s the fundamental change AI brings to gameplay development: not making you build faster, but letting you reject mistakes earlier.

The three systems below are concrete expansions of this thread. The only difference: each system anchors to a different “experience.”

Combat System: the anchor is the combat experience

The combat system’s experience anchor is the combat experience — and game feel is the *means* to achieve it. Game feel on its own is too vague; in practice it splits into two layers — function (mechanics like hit detection, damage, abilities, Buffs) and values (parameters like TTK, crit, recoil, cooldown) — plus 3C (Character, Camera, Control). Each layer is a function-plus-values pairing: function holds up the skeleton, values determine the feel — and game feel ultimately serves that combat experience that keeps players wanting one more fight.

There’s a counterintuitive but critically important order here: function comes before values.

A concrete example. The same “ADS sensitivity 1.2” means completely different things under different functional implementations: does the camera FOV transition linearly or with easing? Does sensitivity scale proportionally with FOV (is there an ADS multiplier)? Can you fire mid-aim-transition, and what happens if it’s interrupted? These are all function-layer design. If the function isn’t settled, the “1.2” you’re tuning has no anchor — it’s stable under one implementation and drifts under another. So the correct order is: get the function clear, implement it, expose the parameter interfaces well, and *then* values mean something.

I sum up AI’s role in the combat system in one line: AI finds references, humans decide.

Here’s how it runs. The designer describes intent in genre language — say, “match the shooting feel of CoD, but with more agile movement.” Note: the designer states intent, not specific numbers. Then AI does two things: first it understands the functional mechanics (translating “what CoD’s ADS does” into “what we need to implement” — FOV transition curve, multiplier, can-you-fire-mid-switch), then it assists implementation (generating runnable code skeletons, exposing all tunable parameters, writing hot reload and unit tests). Once function lands, AI then extracts values: pulling parameters from the reference, applying a differential bias for the “more agile movement” requirement (movement +15%, turn damping −20%), each item carrying a traceable source.

Then comes the step I consider the most critical and the most often overlooked — humans and AI co-create the test cases.

Traditional “review” is one-directional: a human finds faults in AI’s output and gives feedback. But co-creating test cases is bidirectional: the human states experience intent (“in a close-range encounter, the player should be able to decide fight-or-flee within 0.5 seconds”), and AI translates that vague intent into an executable test scenario (“player at 5m, 8 rounds in the mag, enemy head-on — check whether TTK ≤ 0.5s”). The real value of this step is forcing the human to make the intuition of “I think it’s fun” explicit, into a concrete standard of “under what scenario it should behave how.” These cases become the team’s shared consensus — when the machine runs and humans test later, everyone works against the same set of cases, with no “I say it’s fun, you say it isn’t” bickering. And cases are reusable and accumulate over time — this is the first time a small team has a shot at building an AAA-grade testing baseline.

Finally, dual-track validation.

One track is machine runs: virtual players run thousands of automated combats under the case conditions, tallying each case’s actual TTK, hit rate, win rate — validating “are the values right.” Its strengths are speed (seconds), objectivity, and volume, but it can’t test feel. The other track is human playtest: real players play with a controller, checking off the case library one item at a time, validating “does it feel good.” It catches problems like “the values are right but it hits like mush, no feedback,” but it’s slow and low-sample.

The key is that these two tracks are a division of labor, not a substitution: the machine filters values, the human tunes feel. The human’s starting point is already a machine-filtered good version, so they only need to fine-tune the final 5%.

Compared to the traditional “guess a number from experience → blind-tune in-game → change → tune again” loop, the difference of this method isn’t “AI helps you build faster” — it’s that the starting point goes from a random spot to a market-validated baseline. The traditional way might loop 20 to 50 rounds to land on something “okay,” and with no objective reference, it easily falls into a “still not quite there” death spiral; the new way has a benchmark and a case library, is objectively debatable and regression-testable, and converges in one or two rounds. The cycle compresses from two-to-four weeks down to one-to-three days.

Enemy System: the anchor is the emotional value the combat experience brings

If the combat system anchors to “what combat experience the fight delivers,” the enemy system anchors to the emotional value that combat experience brings — what the player walks away with after a fight. Emotional value doesn’t come from nowhere; it grows on top of the combat experience: only with a good combat experience can the enemy create emotional peaks worth remembering inside it.

This is where the enemy system most easily goes wrong. Many teams pour effort into “how strong this enemy is, how complex its behavior tree is,” and forget a more fundamental question: does this design create a “highlight moment” for the player? The experience of a fight isn’t uniform; it’s a curve with peaks and valleys, and what players actually remember are the peaks — those “crisis → turn” moments.

I anchor it with three variables: TTK × OODA × emotional value.

TTK (Time To Kill) is the duration container of an engagement; it determines how many OODA loops the player can run.
OODA (Observe → Orient → Decide → Act) is the player’s cognitive loop during that time. Put simply: OODA is one complete cycle of the player going from “spotting a problem” to “solving it.”
The emotional peak is born in the “crisis → turn” critical moment within the OODA loop.

To make it concrete: the player gets locked on by an enemy (crisis), dodges with a slide and headshots back (turn) — satisfying. Two rounds left in the mag against a full-health enemy (crisis), lands the shot anyway (turn) — satisfying. A teammate goes down (crisis), they wipe three by themselves (turn) — that’s a highlight. None of these are about “the enemy being hard” or “the enemy being weak”; they’re about whether the enemy created a critical point within the OODA loop.

And TTK determines whether these critical points get a chance to happen. TTK too short (say 0.1s) — the player can’t observe, decide, act in time, and just gets one-shot, left with only frustration; TTK too long (over 10s) — too many OODA loops run, breeding fatigue and boredom. A just-right TTK + one complete crisis-to-turn OODA = a highlight experience.

So every function and value in the enemy system has to come back to this question: does it, within the TTK, let the player complete an OODA loop and produce an emotional peak? This rewrites the whole evaluation standard. The behavior tree isn’t just “making the AI move” — it has to create threat signals that pull the player into observing; the attacks aren’t just “dealing damage” — they need a readable wind-up that leaves reaction time; the reaction delay needs whitespace so the player’s action has room. AI with rubber-band instant-reactions is the counterexample — because the player can’t OODA in time; AI that stands still taking hits is also a counterexample — because the player doesn’t need to OODA. Values like HP and damage are the same: HP determines this enemy’s TTK, damage determines the TTK before the player gets one-shot, and both must accommodate at least one or two OODA loops.

A single enemy’s OODA is micro. When multiple enemies combine, a macro OODA emerges — “who do I hit first,” a prioritization judgment. So the overall composition (waves, squads, AI director) should intervene at the valleys of the emotional curve: when the player is too comfortable, add pressure; when too punished, give a breather.

So how do you validate something as subjective as “emotion”?

I split validation into three steps that converge layer by layer — quantity first, quality later.

Step one, machine runs hard metrics. Virtual players run thousands of matches, tallying TTK, player reaction time, OODA-per-minute, death rate, enemy-kill distribution. These are objective data, each metric with an expected range (and those ranges are themselves extracted by AI from benchmark games, not guessed). Cases below target get flagged red and filtered out in bulk first. This step converges thousands of cases down to hundreds.

Step two, infer emotion from player behavior. This step is unique to enemy-system validation, and the cleverest. AI doesn’t “feel” emotion, but it can recognize the objective signatures of emotion happening. The principle: an emotional peak triggers specific player behavior, so you can infer emotion backward from behavior. Three concrete moves: you tell AI “a low-health comeback kill counts as a highlight,” AI translates that into a precise predicate (kill while HP < 20% and the enemy had previously attacked the player); AI writes a script that auto-scans millions of log lines, marking at which second each match triggered which signals; then it stitches these discrete signals into an emotion curve and compares against the design goal, circling the segments where “it should have peaked but didn’t.” This whole process needs no “emotion” from AI; what it does is rule definition + data scanning + comparison — all AI’s strong suits.

Step three, the LLM watches replays. The first two steps have a blind spot: they can only recognize signals you’ve predefined, and behavior is an indirect inference (“the player dodged” doesn’t equal “the player felt good”). The LLM watching replays is here to patch these two holes. Feed game footage and input logs to a multimodal model, and have it — like “a veteran player who’s watched tens of thousands of hours of game videos” — mark a timeline: 0:15-0:22 tense, 0:22-0:25 highlight (low-health comeback), 0:40 frustration (stuck). It works because in training it saw vast amounts of game footage and commentary, and recognizes “what a highlight moment looks like” — it doesn’t feel emotion, but it recognizes the visual signatures of emotion happening. To be honest: this step is still early, its accuracy is climbing fast but isn’t fully trustworthy yet, so it suits being an assistive filter, not a replacement for the human final call.

The point of these three steps working together: run only the data, and the values are right but it might still not be fun; rely only on humans, and the cost explodes. Only by dividing labor across three steps can you validate emotion both fast and accurately. And every “misaligned” point is a clear fix directive — go back to the corresponding function or value quadrant and fix it.

Level System: the anchor is pacing and exploration

The level system’s experience anchor is pacing and exploration. But unlike the previous two systems, this layer has a particularity: it’s the assembly layer.

What a level designer actually delivers is the coordinated placement of four things: POIs (points of interest), the path network, enemies, and resources. A boundary to draw here — enemy *design* belongs to the enemy system (how strong this enemy is, how its behavior tree is written), resource *values* belong to the economy system (how much a medkit heals), but “what to place at this spot on this map, how many to spawn, which supplies to put down” — that’s the level system’s job. The level system takes the enemies the enemy system provides, the resources the economy system defines, the terrain the scene system provides, and places them at specific positions according to the experience goal.

And the core difficulty of these four things: they must coordinate, point at the same experience goal, and not cancel each other out.

A few counterexamples make it clear. If enemies are placed densely (to create tension) but resources are handed out generously (to relax), the two cancel and the player feels no tension at all. If POIs want to guide the player east (encouraging exploration) but the path network only goes west, the design intent falls flat. If a rare resource is piled at a POI, but that POI is in a dead end the path network never passes, the player can’t reach it — placed for nothing. So the real craft of level design is translating “this segment should be tense” into a recipe: combat POIs clustered + path network narrowed to a single line + high enemy density + resources cut off — all four serving “tension” together. Each recipe is a packaged rule of “experience goal → placement of the four.”

AI’s positioning here must be stated clearly: AI doesn’t understand level design itself. I won’t dodge this. What AI does is inherit the expertise of professional designers — taking the scattered, tacit experience of “which recipe for which experience” and structurally precipitating it into a recipe library, called up during production and refilled after validation. The first use may be very empty (no experience), but the more it’s used and refilled, the more valuable the library becomes, eventually turning into a team asset. This is the same principle as the combat system’s “continuously evolving test-case library” — both precipitate human tacit experience into explicit team assets.

With the recipe in hand, how does it land? I’ll only mention one methodological point here; the concrete UE implementation is left for the level system’s hands-on post.

The core idea in one line: config-driven. All four recipes drop into config tables, and a procedural approach reads the tables to generate — the designer edits tables to set intent, the tooling reads tables to land it automatically, and re-editing the table regenerates, without touching code. This is exactly how that main-thread “config-driven” manifests in the level system: programmers expose capabilities as config so designers can iterate by editing tables, and nobody has to change code frequently.

To put it a bit more clearly: every design output (POI, path network, enemies, resources) has a corresponding landing mechanism, and what connects “design intent” to “engineering implementation” in the middle is the config table as the handoff interface. The designer, on the table’s side, expresses “what’s wanted”; engineering, on the other side, handles “how to generate it.” With the two decoupled, iteration is fast.

Finally, how do you validate a level once it’s built? Same thing — collect data back and check against the experience goal: player trajectory, dwell time, death-point telemetry, generate a heatmap, and compare against the recipe’s intent: was the tense segment actually tense (did the four not cancel out)? Did players go to the POIs they should? Did they take the right path? Whichever recipe proves effective in real runs gets reinforced into the library; whichever flops gets its parameters corrected — the recipe library gets more accurate the more it’s used.

Future Directions: what else fits the same framework

With combat, enemies, and levels covered, this framework can actually extend to more systems.

Progression, loot & economy, quests & narrative are game systems on par with combat/enemy/level, all able to reuse the same main thread: Experience Goal → Function & Values → AI collaboration → Validate & Iterate. Runtime AI Director and PCG × AI are two cross-cutting capabilities that span all systems — the former makes the game “come alive” for each player, the latter makes content “grow itself.” The Director’s architecture was discussed in detail in the previous post (wave orchestration, event injection, content mutation — that set), so I won’t repeat it; I’ll only restate that one boundary — the previous post put it as “the LLM handles ‘what to think,’ the behavior tree handles ‘how to do it’,” which in this framework means the LLM only does between-match orchestration, and never steps in to make per-tick decisions.

These are just touched on here; later posts will expand each one.

The Technical Foundation: making all of this genuinely reliable and accessible

Everything above is about “what to do.” But whether all of it can ship depends on two technical foundations.

The first foundation: how to make AI write code reliably.

My answer — reliability doesn’t come from “AI being smarter,” it comes from a fixed development workflow (an SOP): Pick Framework/Skill → Make a Plan → Execute under the Skill → Function Review → Configify → Values Review.

I’ll only call out the most core judgments here; the full workflow gets its own dedicated post, *AI Coding SOP*. One, box the AI inside the existing client framework — it’s filling in blanks, not building from an empty lot. Two, produce a design Plan for human review first, and never allow skipping the Plan to code directly. Three, and the part most relevant to the gameplay systems — two reviews: the Function Review governs “is the code correct,” the Values Review governs “are the values correct.” Function Review fails — go back and change code, expensive; Values Review fails — only change the config table, don’t touch code, dirt cheap. This is exactly how the “iterate by tuning values, not changing code” that this whole post keeps stressing lands in engineering — code is the skeleton, get it right in one pass; values are the tuning layer, change and re-check repeatedly.

The second foundation: letting everyone use it, not just programmers.

The first foundation cuts programmers’ cost. This one cuts everyone’s cost — and it focuses on the gameplay-design and prototyping stage.

What AI plays here is a translation layer between people and the editor. In the traditional flow, for an artist or designer to land an idea, they first have to learn a pile of complex editor operations, menus, and parameter meanings, then do it by hand in the editor. With a translation layer, they express gameplay intent in plain language (“give me a shotgun with 0.4s close-range TTK,” “this segment tense, that one a breather”), and AI translates it into actual editor operations and configs, quickly landing a playable prototype.

It cuts three kinds of cost: learning cost (no editor operations, menus, parameter meanings to learn), usage cost (no manual clicking one by one — AI batch-executes), config cost (no field names to memorize, no fear of mistakes — AI generates and validates from intent).

And this layer connects directly to the prototyping of the four earlier systems. The designer says “match CoD feel, more agile movement,” and AI translates it into 3C parameter config, straight into the game to tune; the designer says “flanking melee enemies, spawn in groups of three,” and AI translates it into placeholder enemies + behavior config + spawn table; the designer says “this segment tense, that one a breather,” and AI translates it into POI/path/resource config generated by PCG. Each one is “designer states intent → AI translates → playable prototype, validate immediately.”

But the boundary must be drawn very clearly: this stage only builds the “gameplay prototype,” it doesn’t touch art presentation. The art at this stage is exactly the first of the four stages from the previous post — the validation stage: providing good-enough placeholders for gameplay (blockout characters, greybox, marketplace assets, temp VFX), serving the gameplay experience, not making polished final assets. The goal is “good-enough + fast,” because this stage is about validating gameplay. The later procurement, optimization, replacement, plus art presentation, rendering, lighting and materials, are not in this section — they’re left for later chapters.

The meaning of this layer is a shift in perspective: what AI cuts has never been “programmers’ cost” but everyone’s cost — freeing every role from “fighting tools” and putting their energy back on “judging whether the gameplay is fun.” And what AI cuts is only “operational cost,” not “judgment” — whether the gameplay is fun is always a human call. AI executes, humans decide.

In Closing

To collapse this map into one line:

AI doesn’t replace human judgment; it makes the path “from experience to playable prototype” 10× faster.

Humans set the experience goal and make the call; AI finds references, lands them, runs validation. The three gameplay systems each have their own experience anchor — combat anchors the combat experience, enemies anchor the emotional value that experience brings, levels anchor pacing and exploration — and the anchor determines “what to validate,” and validation drives iteration. Without an experience anchor, you don’t know what to validate, and AI has nothing to collaborate on. The anchor is the starting point of everything.

This post is a map. The articles that follow will be an ongoing record of a real development process.

Not a discussion of what AI can do.

But a discussion of — how, once AI truly enters game development, a team understands systems, validates experience, refactors gameplay, and ultimately turns ideas into a game.

The map is drawn. Next, we walk into each street.

AI Native Game Dev in Practice — TPS Camera Overhaul

May 14, 2026 by trace yang

I. Why This Article

In our previous article, we discussed the methodology of AI Native game development, with the core thesis being “experience validation first” — games can be fully played and validated before production assets are finished.

Methodology sounds great on paper, but what does it actually look like in a real project?

This article answers that with a real case: we had an FPS engine and wanted to transform it into a TPS. Not building a TPS from scratch, but making an existing FPS system “feel like a TPS.”

This isn’t a TPS camera technical article. It’s a record of how AI participated in a real development process — from understanding an unfamiliar system to producing a complete refactoring plan.

II. Starting Point: An FPS Engine Wants to Become TPS

The engine we faced had been in operation for some time — built on UE, substantial codebase, multiple teams had worked on it over the years. It did have a TPP mode — there was a CameraMode component supporting FPP/TPP switching, and the camera did move behind the character when you switched.

But the moment you switched to TPP, you could immediately feel “this isn’t TPS.” Camera distance fixed at 216cm, shoulder offset only 24cm, no follow lag whatsoever — felt like a camera rigidly welded to the character’s back. Aiming switched straight back to FPP, so you couldn’t properly ADS in TPS view. Shooting had almost no camera feedback — recoil and fire shake systems existed but only worked in FPP. Hugging walls caused the camera to clip straight through. Overall it felt more like “watching an FPS from behind” rather than a genuine third-person shooter experience.

The traditional approach is familiar to everyone: designers write a camera requirements doc based on experience, list a bunch of “features to add,” engineering schedules the work, builds it, discovers it’s wrong during integration, revises, re-integrates, revises again. Months later, still tuning basic feel. And because nobody has a full picture of the existing system, there’s a good chance of reinventing wheels — some feature already exists but nobody knows.

That’s how we initially planned to approach it too.

Then we changed our thinking: instead of guessing “what needs to be built” based on experience, let AI read through the entire camera system first and figure out “what already exists.”

III. AI Isn’t “Generating Features” — It’s “Understanding the System”

This was the most surprising part of the entire process.

After AI read through the complete camera system source code, we discovered an unexpected truth: this engine’s camera framework was actually very capable.

The SpringArm component supported 15+ movement states — standing, crouching, prone, vaulting, parachuting, swimming, being carried, NPC dialogue… each with independent camera distance, offset, and height configurations, driven through a TMap called BasicLayerConfig configured in Blueprints — completely data-driven. The CameraModifier system lined up impressively: weapon recoil modifier, gun sway modifier, hit camera modifier, fire shake modifier, prone impact modifier, vehicle camera modifier… over a dozen modifiers each handling their own responsibility. The scope system had a complete CSV config table with 50+ scopes, each with its own zoom ratio, FOV, and depth-of-field parameters. FPP/TPP switching had a complete priority mechanism supporting multiple systems requesting camera mode changes simultaneously with priority arbitration.

Framework capabilities far exceeded our expectations.

But AI also uncovered another side. Having AI dig through a years-old production codebase is a lot like archaeology — you keep excavating things, some still in use, some buried, some half-finished and abandoned.

Sprint camera layer — complete acceleration/stabilization/deceleration camera change logic, roughly 60 lines of code, but entirely wrapped in #if 0. This wasn’t “never built.” It was built, then abandoned.

Recoil system — highly refined, supporting per-stance recoil patterns with spring-damper recovery. But detecting non-FPP mode triggers an immediate return. Whether an engine “feels like TPS” sometimes comes down to a single line of code.

ADS aiming — triggers forced FPP switch with no configuration toggle. TPS experience breaks apart not because systems are missing, but because systems aren’t truly cooperating.

The code also contained large blocks annotated “deprecated logic” — left/right shoulder shooting camera layers, jetpack camera layers, all with configuration property declarations but commented out. A previous team had planned a complete TPS shoulder-shooting camera system, but it was shelved at some point. These ruins mixed in with active code — a production engine that’s been running for years often doesn’t look like “one system.” It looks more like ruins from multiple eras stacked on top of each other. Without AI flagging them, it’s very hard to distinguish what’s alive from what’s dead.

AI traced the call chains further down and dug up some even more unexpected things.

FPP/TPP transition timing was asymmetric — switching into FPP took 0.1 seconds (instant snap), switching back to TPP took 0.6 seconds (smooth transition), with independent easing curves for each direction. These weren’t arbitrary numbers: fast zoom-out looks jarring, so entering FPP needs to be fast while returning to TPP needs to be slow. An experience design decision embedded in code, with no documentation recording why. When AI found these numbers, we realized previous developers had put very careful thought into this.

The camera mode system had 7 priority levels. Interestingly, player manual FPP/TPP switching ranked 5th — above vehicles. Code comments explicitly stated: “Player manual switch has very high priority. If gameplay logic needs to override it, it must first call the cleanup interface.” A UX philosophy embedded in code: player choice trumps most system behaviors.

The most unexpected discovery: AI traced the retarget logic and found that even when the player was in TPP mode, if the weapon was zoomed in, animations would silently switch to FPP mode. When aiming down sights in TPP, the character’s arm animations were actually using FPP ones. Without reading the complete chain, you’d never find this.

An engine that doesn’t “feel like TPS” isn’t because it lacks TPS capabilities — it’s because those capabilities are scattered everywhere, some disabled, some FPP-only, some with logic written but no parameters configured, some half-built and shelved, some hidden in conditional branches you’d never notice.

We initially thought “we need to build many new features.” We later discovered “most features already exist — they just weren’t being used properly.”

This discovery changed the entire project trajectory. Under traditional thinking, we would have planned a “develop new TPS camera system” project taking months. What actually needed to happen was “enable existing features + tune parameters + fix a few hardcoded blocks” — an order of magnitude less work.

Even more interesting, AI mapped out a complete configuration hierarchy: the base layer is BasicLayerConfig controlling per-stance camera parameters, the middle layer is IndoorLayerConfig automatically applying additive offsets indoors (doing an upward ray trace every 0.5 seconds to detect ceilings), and the top layer is the scope table controlling ADS FOV and depth of field. Many “camera feels wrong” problems might not need code changes at all — just table tuning. But before AI’s analysis, nobody knew the full picture of this three-layer configuration system, let alone the data flow between them.

This made us rethink a question: in traditional projects, the time teams spend “developing new features” may not be as much as imagined. A huge amount of time is actually spent “figuring out what state the existing system is actually in.” In a medium-to-large codebase, who wrote some feature, who disabled it, why it was disabled, whether it can still be used, which table the parameters are in — these questions can take days to weeks through manual code review. AI compressed this process to a few hours.

And AI’s “understanding” isn’t simple code search. It can string together logic scattered across a dozen files into a complete chain: player presses aim → FSM Action triggers → CameraMode component arbitrates by priority → SpringArm state switches → BasicLayerConfig looks up corresponding parameters → interpolates to target position → CameraModifiers stack recoil/sway/hit effects one by one → final camera transform output. This chain spans Gameplay Framework, Camera, Weapon, and Animation — four modules. No human can trace this end-to-end in their head at once. AI can.

IV. From Feature List to Design Principles — Every Principle Comes From a Real Pitfall

After understanding the system’s current state, the first draft naturally became a Feature List: ADS should stay TPP, add Sprint camera, enable peek, complete TPP shooting feedback, optimize collision…

A long list.

Then we hit a problem: with so many features listed, we didn’t know what to prioritize, what to restrain versus amplify, or whose rules win when two features conflict.

What the Feature List lacked wasn’t “what to do” but “why to do it” and “what not to do.”

So we started building design principles. Not by sitting down to write a complete Camera Philosophy from scratch — nobody reads those. Instead, every time we hit a pitfall, we distilled the lesson into a principle. Five principles, each from a real problem.

Camera Philosophy: Readability > Feel > Cinematic > Flashy

We initially tried adding dynamic camera movements for a “premium feel.” Heavy follow inertia, aggressive Sprint FOV push, strong hit screen offset. It definitely looked more “dynamic.”

Then we discovered players couldn’t hit anything.

Comfort & Stability: No Fatigue Over Long Sessions, ADS Must Be Stable

We tried stronger Camera Lag and Sprint Shake. The first problem wasn’t “not exciting enough” — players started getting dizzy.

Short-term reaction: “Wow, this camera has such a cinematic feel.” After 30 minutes: “I need a break.” TPS is a genre for long play sessions. Camera comfort matters far more than short-term excitement.

Transition & Rhythm: State Transitions Need Tempo

Every camera feature worked fine in isolation. But running simultaneously, Sprint pull-back hadn’t recovered before ADS kicked in, ADS was still transitioning when recoil fired, recoil hadn’t settled when an explosion hit.

The problem wasn’t that any single feature was wrong — the camera had lost its rhythm.

Consistency: Same Action, Same Feedback, Every Time

ADS shooting feedback didn’t match hip-fire feedback. Sprint pull-back distance varied. ADS transition speed was inconsistent.

Result: players could never build stable muscle memory.

Safety Rules: Camera Must Never Break

Narrow spaces causing camera to shake wildly. ADS getting forcibly pulled open by explosion effects. Wall-hugging causing camera to clip through the character. These aren’t “bad experience” problems — they’re “experience completely collapses” problems.

V. Separating Function from Feel — AI Exhausts Possibilities, Humans Decide What’s Right

Halfway through the plan, we discovered a more fundamental issue: much of camera experience quality depends not on “whether a feature exists” but on “whether parameters are right.”

The same Sprint Camera feature — pushing FOV 2 degrees more or less, pulling the camera back 30cm versus 50cm — feels completely different. Getting the feature right is just the starting point; getting the parameters right is the finish line.

But feature development and parameter tuning require completely different skills. Feature development needs code architecture understanding, logic changes, edge case handling; parameter tuning needs repeatedly playing the game, judging by feel, making tradeoffs. AI does the former quickly and reliably; only humans can do the latter.

So we split the workflow into two tracks:

AI handles making features work — analyzing code, enabling disabled features, filling missing logic paths, generating parameter table templates with suggested defaults.

Designers handle making the experience feel right — taking AI-generated parameter tables, repeatedly playing and testing, adjusting item by item until the feel is correct.

More precisely: AI is better at “exhausting possibilities,” humans are better at “deciding what’s right.” AI can tell you “this engine has 15 camera states, 8 types of Camera Modifiers, 50+ scope configurations,” but only a human can judge “how far the camera should be from the character when standing to feel most comfortable.”

This isn’t AI’s limitation — it’s the right division of labor.

For a concrete example: after analyzing the shooting feedback system, AI told us “among the dozen-plus CameraModifiers, recoil, fire shake, and FPP bone animation only work in FPP; gun sway and joggle camera work in both FPP and TPP; hit camera works in TPP but is weak due to bone animation dependency.” Based on this analysis, AI generated a TPP recoil parameter table template categorized by weapon type. But “does scaling rifle recoil to 0.5 of FPP feel right” and “will SMG’s high-frequency shake annoy players” — these judgments only come from someone playing a few rounds with a controller.

AI continuously outputs “what the system can do” and “how parameters can be tuned”; designers continuously answer “does this feel right” in-game. Two tracks running in parallel, much faster than traditional sequential feedback loops.

VI. Development Plan: First “Does It Look Like TPS,” Then “Can You Aim,” Finally “Does Shooting Feel Good”

With plan and principles in hand, the next question was: what to do first?

The biggest problem with TPS cameras usually isn’t “missing features” but “foundation rhythm isn’t established.” If the basic viewing angle is wrong, adding more advanced features is building on a crooked foundation.

We broke P0 into 7 steps in strict sequence:

Step one was basic TPP view and shoulder position. This step only answers one question: “Does it look like a TPS?” If shoulder offset is wrong or camera distance feels awkward, everything downstream will be off.

Step two was ADS. Establishing stable aiming experience. This was the highest-risk step — affecting input, FOV, recoil, aim stability — everything interconnected. Placed at step two to surface problems early.

Step three was collision handling. Steps four through seven: shooting feedback, Sprint camera, shoulder switch, hit feedback.

Each step validated immediately before proceeding to the next.

Why does this order matter? Because TPS camera dependencies are directional. Wrong base view means ADS pull-in targets the wrong position; unstable ADS means shooting feedback can’t be tuned; unresolved collision means all indoor features break. Each subsequent step depends on previous steps being correct.

We also risk-rated each step. ADS overhaul was high risk — it touches input, FOV, recoil, aim stability; changing one thing might affect five others. Sprint Camera and peek were low risk — existing logic just needs uncommenting.

One core principle throughout: unless it affects core experience, prioritize reusing existing capabilities over large-scale refactoring.

After P0 completion, lock the foundation. P1 (advanced feel) and P2 (advanced combat presentation) must not break P0’s established baseline.

VII. AI’s Real Value — Compressing the Cost of “Understanding Current State”

Looking back at the entire process, AI didn’t automatically produce a TPS camera.

What AI did was compress “understanding current state → identifying problems → producing plans → breaking down tasks” from weeks to days.

In this project, AI began to resemble a senior engineer on the team for the first time — not because it writes code, but because it can rapidly understand the entire system.

AI read the complete camera system source code in hours, telling us: what exists here, what’s missing there, what got disabled by whom, what the config tables look like, how the data flows. Then based on these facts (not guesses), produced plans and schedules.

Traditional flow is “guess first, verify later” — guess what needs building based on experience, verify after development whether the guess was right. AI Native flow is “look first, build later” — let AI read through the system first, produce plans based on facts, then develop with targeted precision.

This connects back to the previous article’s core thesis, but at a different dimension. The previous article’s “experience validation first” mainly meant validating gameplay direction with placeholder assets. This practice made us realize: validation first isn’t just validating gameplay — it includes validating “what the existing system can actually do.” Understanding what cards you hold before making moves is itself a form of validation first.

There was an unexpected bonus. AI produced not just a plan, but a “system map” the entire team could understand. Previously only the original code authors knew what the camera system looked like. Now AI organized the complete architecture, data flows, config tables, active features, and disabled features into documentation. New engineers and designers joining could get up to speed by reading this document instead of spending two weeks reviewing code themselves.

This may be an undervalued contribution of AI in large-scale engineering: not just helping you do things, but helping you turn tacit knowledge into explicit knowledge. In many codebases, the greatest asset isn’t the code itself but the contextual information of “why it was written this way” and “what it can do.” This information previously existed only in a few people’s heads. AI makes it documentation everyone can reference.

VIII. This Is Just the Beginning

Camera is only the first step of TPS transformation.

We’ll continue dismantling movement systems, TPS aiming, animation layering, and weapon mounting module by module. Not just presenting “final solutions,” but genuinely recording how AI participates in the continuous refactoring of a large game engine.

Many problems are honestly still far from solved. AI can help teams understand systems faster, validate directions faster, and prototype faster. But “what makes a good experience” still requires humans to repeatedly test and stumble. Camera feel, shooting rhythm, movement weight — no AI can judge these for you. Only someone holding a controller and playing for dozens of hours can gradually approach the answers.

To date, AI’s greatest value remains not “automatically generating games,” but enabling teams to understand systems faster, validate directions faster, and discover what isn’t fun faster.

AI exhausts possibilities; humans decide what’s right. AI understands systems; humans understand experience.

This is an ongoing AI Native development record, not an isolated case study. If you also have an engine that “clearly has capabilities but doesn’t feel right when running,” try letting AI read through it first. You might discover, as we did: the answers may already be in the code — it’s just that nobody had ever looked at the complete picture.

AI Native Game Development — From Pipeline Restructuring to Dynamic Experience

May 12, 2026 by trace yang

What AI Native games truly change isn’t AI-generated assets. It’s that the game industry is shifting for the first time from “competing on asset production capacity” to “competing on experience validation speed.”

I. Industry Thesis: What’s Actually Expensive in Game Development

For the past twenty years, the core bottleneck of the game industry has been content production.

Traditional AAA development relies on massive art teams, long asset production cycles, and heavy-asset iteration pipelines. A AAA project routinely involves hundreds of artists, two to three years of asset creation, and budgets measured in hundreds of millions. On the surface, these costs go toward “making content.” But if you break it down carefully, what’s truly expensive isn’t “writing code” or “building models” — what’s truly expensive is “validating a gameplay direction.”

A level designer has an idea but needs to wait for environment art to be built before validating player flow. A combat designer wants to adjust shooting feel but needs weapon models and VFX in place before sensing feedback. A systems designer wants to test new spawn pacing but needs enemy models and animations ready before running a playtest. Every validation is blocked by art asset progress. The later you discover a wrong direction, the more art capacity you’ve wasted.

We’ve all seen the classic disasters: weapon VFX carefully crafted over three weeks, only to discover TTK is fundamentally wrong once implemented. Boss animations produced over two months, only to find the mechanic isn’t fun when you actually play it. Scene art built over three months, only to discover the map scale is off and combat spaces are too open during playtesting.

What’s truly expensive was never “making assets” — it’s “making the wrong assets.”

Every team has experienced it: a group of people spend months building resources, only to finally realize “this thing just isn’t fun.” That silence in the meeting room — everyone who’s shipped a game understands it.

This is the biggest structural waste in traditional game development: design validation and asset production are tightly coupled.

The real change AI brings is not “automatically generating games” — we’re far from that today. The real change is:

For the first time, game teams can complete gameplay validation before production assets are finished.

This means three things:

Design validation moves earlier — full experience loops can be run on placeholder assets to confirm design direction
Art shifts from “production-driven” to “experience-convergent” — instead of designers validating whatever art finishes first, art converges toward whatever design has already validated
Engineering returns to the center of development pacing — AI dramatically accelerates engineering output, moving “time to a validatable build” from months to days

This is the fundamental difference between AI Native games and traditional game development. It’s not “who AI replaced” — it’s that the rhythm control of the entire development pipeline has changed.

Many teams in the industry are still discussing “when will AI replace artists.” But the question itself is wrong. If a team is still waiting for “AI to generate production-grade 3A assets,” it has likely already missed the real window for AI Native game development. The window isn’t in asset generation — it’s in pipeline restructuring.

II. Why Extraction Shooters, Why Now

Over the past two years, AI discussions in the game industry have clustered around two extremes: “AI-generated 3D assets will soon disrupt art production capacity” on one side, and “AI can only chat and offers no real help for game development” on the other. Both are wrong — the former overestimates 3D generation maturity, the latter underestimates AI’s already-scalable capabilities in engineering and design.

The actually valuable question isn’t “when will AI replace artists” but: Given that 3D generation isn’t mature today, if we adopt a different asset strategy, how far can AI actually push the development of a game?

We chose ARC Raiders-style PvEvP cooperative extraction shooters as our analysis target, not because this genre is the hottest, but because it’s naturally suited for AI intervention on two levels.

On a technical level, this genre’s structural characteristics maximize AI’s intervention space:

Single-match format means no cross-session memory or long-term narrative consistency needed — neatly avoiding LLM’s weakest area
PvE core means no PvP fairness constraints — AI can freely orchestrate content without breaking competitive balance
Search-fight-extract three-act structure is inherently modular — each phase’s design/engineering/art work can be independently evaluated
Small squad size means minimal data volume — even with runtime AI scheduling, compute costs remain manageable

On an experience structure level, the core fun of extraction shooters inherently comes from uncertainty — dynamic encounters, risk variation, resource pressure, on-the-spot decision making. Players don’t expect the same fixed flow every match; they expect to never know what they’ll encounter each time they drop in. This is precisely what AI Directors excel at: dynamically recomposing experiences. From Left 4 Dead’s AI Director to Roguelike procedural generation, this kind of “every match is different” game structure is naturally suited for AI. Extraction shooters are simply the genre that best matches current technology maturity on this axis.

This isn’t a speculative article about “what AI games will look like in the future.” This is about an engineering approach that can be deployed today, and an honest assessment of what doesn’t work yet.

We decompose game development into 8 pipelines across two layers:

Core Gameplay Layer: Combat, Enemies, Missions/Levels, Characters/Loadout
Scene & Presentation Layer: Environment, Game Feel, Audio, UI

The art side doesn’t pursue AI-generated production-grade 3D assets, but instead follows an “AI placeholder validation + marketplace/project asset reuse + AI batch refinement” approach — not a compromise, but a deliberate choice based on the industry thesis above: decoupling experience validation from asset production.

III. Core Gameplay Layer: Engineering and Design Are Already Fully AI-Ready

1. Combat Pipeline — Parameter-Driven Work, AI’s Natural Fit

The combat pipeline is one of the best demonstrations of AI’s deployment value, but not because “AI can write perfect combat code.” It’s because combat system work is fundamentally parameter-driven.

Weapon DPS, TTK, skill cooldowns, buff stacking rules — these aren’t creative work, they’re math. Traditionally, designers spend enormous time manually filling Excel sheets, iteratively fine-tuning values, running simulations to verify balance. This is exactly what AI excels at: given constraints, exhaustively explore parameter space, output proposals that match design intent. An experienced designer working with AI can complete in one day what previously took a week of numerical iteration.

Engineering is even more direct. Weapon state machines, damage calculation modules, skill frameworks, buff systems — these are highly pattern-determined engineering tasks. AI code generation quality in these scenarios is already quite stable — not “usable” level, but “goes straight into production” level. Batch config table generation and validation is AI’s particular strength, essentially eliminating manual data entry error risk.

The combat pipeline’s bottleneck was never art resources — a gun model doesn’t affect shooting feel tuning. The bottleneck is numerical iteration speed. AI opens exactly this bottleneck. A placeholder proxy mounted with AI-generated parameter tables and code lets designers tune TTK today.

2. Enemy Pipeline — Structured Design + Behavior Systems, AI’s Strength

The enemy pipeline has an underappreciated characteristic: its design work is highly structured.

An extraction shooter’s enemy taxonomy is fundamentally a classification problem — build a matrix by body type (light/medium/heavy), behavior pattern (patrol/ambush/chase/AOE), difficulty tier (normal/elite/boss), then fill in attribute parameters and counter-relationships. This “define framework first, fill parameters second” workflow is where AI is both fast and stable. Budget bucket parameters, patrol route plans, wave config tables, difficulty curves — where designers previously needed repeated playtesting to tune, AI can batch-generate first versions based on design intent, with designers doing precision adjustments on top. Efficiency gains aren’t percentage-point improvements — they’re order-of-magnitude.

Engineering-side, behavior tree/state machine code, perception systems, SpawnManager — equally pattern-determined work with stable AI generation quality.

What does validating spawn pacing actually require? A working SpawnManager plus a few capsule proxies, not meticulously crafted monster models. Getting “a testable enemy system” goes from weeks to days. What AI eliminates isn’t workload — it’s waiting.

3. Mission/Level Pipeline — The Modular Advantage

Extraction shooter mission design has a natural advantage: the three-act structure is inherently modular.

Search phase POI configuration, combat phase encounter design, extraction phase pressure curves — each phase can independently define parameters and rules. AI can generate mission type libraries (destroy, escort, collect, defend, recon) and POI function assignments based on map topology. The designer’s core value here isn’t “coming up with these mission types” (AI can cover that), but “judging which combinations feel good experientially” — this is a game-feel judgment call that AI cannot yet replace.

Engineering-side, mission state machines, trigger systems, event dispatchers are all standard AI deployment scenarios. Dynamic event system frameworks are mature generation targets.

This pipeline has high engineering volume and clear patterns — AI’s ROI is very high. Traditionally, validating a level’s extraction pacing required waiting for scene art to be completed — months. With AI whitebox + mission system code, the cost of discovering a wrong direction drops from three months to two days.

4. Character/Loadout Pipeline — Config-Table-Intensive, AI’s Efficiency Dominance

Equipment system design and economy model simulation are typical design-side AI scenarios. But where AI delivers the most value on this pipeline is actually config table generation.

A mid-scale extraction shooter might have dozens of weapons, multiple armor sets, and numerous stratagems. Each requires complete DataTable configuration: base attributes, quality bonuses, upgrade curves, compatibility rules. Traditionally this is pure manual labor — time-consuming and extremely error-prone. AI batch generation + auto-validation speeds this up 5-10x while virtually eliminating data consistency issues.

The underestimated cost in game development isn’t “thinking of a good design” — it’s “turning a design into hundreds of error-free config rows.” What AI changes first isn’t asset production — it’s config production.

Worth adding: AI is particularly well-suited for parameter space exploration, numerical balancing, build combination analysis, wave design, encounter matrices, drop configuration, config validation — and these happen to be exactly what mid-level designers spend most of their time on daily. AI won’t replace “people who can design,” but it will rapidly eliminate “people who can only fill spreadsheets.” The core value of design is shifting from “executing configuration” back to “making experience judgments.”

IV. Scene & Presentation Layer: Engineering Still Strong, But Feel Tuning Needs Humans

5. Environment Pipeline — AI Does Infrastructure, Humans Do Spatial Feel

The environment pipeline best illustrates AI’s capability boundary.

What AI can do is concrete: map functional zoning, POI density suggestions, procedural vegetation/prop placement rules, terrain generation and navmesh code. These are all rule-driven, data-driven work with clear inputs and outputs.

What AI can’t do is equally clear: spatial feel. A good extraction shooter map derives its tension from sightline management, from the spatial narrative of “turning a corner into an enemy,” from the tactical depth created by elevation differences and cover distribution. This requires level designers’ intuition and extensive playtesting — AI cannot currently replace this.

The most pragmatic approach: AI handles “infrastructure” (terrain generation, streaming, interaction code), humans handle “experience” (spatial layout, atmosphere, tactical path design). Collaboration, not replacement.

But even in the environment pipeline, the validation checkpoint still moves earlier. AI-generated whitebox terrain, though rough, is sufficient for level designers to validate spatial scale, sightline relationships, and tactical paths. Confirm “is it fun” first, then invest resources to make it “look good.” What this sequence saves isn’t just time — it eliminates the classic disaster of “spending three months building a scene only to discover the scale is wrong.”

6-8. Game Feel / Audio / UI

Game Feel: Engineering-side (screen shake, hit stop, camera shake, destruction system code) AI can generate directly, but final hit feedback tuning is subjective judgment work requiring human iteration. VFX assets follow marketplace reuse + AI parameter fine-tuning.

Audio: AI-generated temporary SFX and music serve the prototype phase — far better than “no audio,” because you simply cannot evaluate hit feedback without sound. Final quality audio still requires professional production.

UI: The pipeline with highest art-side AI maturity. 2D UI element AI generation quality is already production-ready, combined with marketplace UI Kits for essentially full AI coverage. This isn’t future speculation — it’s happening today.

V. Development Rhythm Shift: Who Defines the Production Timeline

For the past decade, game development’s rhythm control has effectively been held by the art production pipeline.

Because gameplay validation depends on assets, scene validation depends on environments, combat validation depends on VFX, flow validation depends on level art. Engineering determines “can the system run,” but what truly determines “when can we validate the experience” is asset production speed. Designers wait for artists, artists wait for outsourcing, outsourcing waits for feedback — the entire chain’s rhythm is dictated by its slowest link.

AI changes this. When AI can batch-generate code, auto-generate configs, rapidly build whiteboxes, and generate playable placeholder assets — game development returns to “engineering + design driven” for the first time.

This change matters far more than “X% efficiency improvement.” It means:

Iteration cycles compress — from monthly (wait for assets → test → feedback → revise) to daily (AI generates → test → feedback → regenerate)
Experimentation costs plummet — want to try three different level layouts? Previously that meant triple the scene production time. Now it means three AI whiteboxes, all validated within a week
Small teams gain large-team iteration velocity — not because fewer people is better, but because AI eliminates “waiting,” the biggest time black hole

This also changes role relationships within teams. In traditional pipelines, designers submit documents then enter a long wait — waiting for art assets, waiting for engineering to build systems, waiting for integration to become playable. Now, the same day a designer submits a document, AI can generate a playable version. Designers shift from “submit requirements then wait” to “submit requirements then immediately validate.” Art’s role changes too: no longer “build first so designers can validate,” but “designers validate first, then tell art which direction to build.”

Key judgment: The most competitive teams in the AI Native era won’t necessarily be those with the most art assets, but those who can fastest validate “what’s fun.” Engineering + AI-driven iteration speed is becoming the new core competitive advantage.

VI. Pipeline-by-Pipeline Validation: Running the Game Before Assets Arrive

Chapter V covered the macro rhythm shift. This chapter covers the specifics: how designers and artists complete validation during the placeholder phase, pipeline by pipeline.

Combat: After AI generates values + code, mount placeholder weapons and temporary VFX, and designers can immediately validate shooting feel and TTK in-game. No need to wait for weapon models. Artists observe VFX timing and rhythm on this validated build, confirming direction before marketplace selection — rather than buying assets first and discovering they don’t fit.

Enemies: Capsule proxies color-coded by unit class are sufficient for designers to test spawn pacing and difficulty curves. Artists confirm “how much body size differentiation is needed” and “what silhouette features ensure readability,” then approach marketplace asset matching with clear requirements.

Missions/Levels: AI whitebox levels let designers run the full search-fight-extract flow — is pathing smooth, is POI density right, is extraction tension sufficient? Run the experience through on whitebox; production scenes coming in is just reskinning, not direction gambling.

Environment: Confirm spatial scale and sightline relationships on whitebox. Artists plan production scene layout direction accordingly, avoiding the “three months building a scene, then discovering the map is too big/small/has flow problems” disaster.

Audio: AI generates temporary gunfire/explosion/ambient SFX, letting designers experience the complete audio feedback chain. Validate audio layering and priority relationships with placeholders before commissioning production audio.

UI: AI generates debug HUD so designers see real-time health/ammo/status data in-game, validating information clarity and interaction flow.

Every pipeline follows the same pattern (see Experience Validation Loop diagram).

Traditional pipeline: “Art leads, design follows assets”
New pipeline: “Design leads, assets follow experience”
The most expensive thing in traditional game development isn’t production — it’s direction gambling. The new pipeline reduces direction gambling costs by an order of magnitude.

VII. Art Asset Strategy: Not Waiting for AI to Generate 3D — You Don’t Need It To

There’s an industry mental habit: when discussing AI’s value in game development, the conversation always gets stuck on “when will 3D asset generation mature.” As if AI has nothing to offer game development until 3D generation is ready.

This thinking is wrong.

3D asset generation is indeed immature, with no visible path to production-grade quality in the near term. But that doesn’t mean the art side can only wait. Our strategy is a four-phase workflow that completely bypasses the “AI generates 3D” bottleneck.

Phase 1: Validation — Testable in 24 Hours

After designers produce design documents, AI immediately generates whitebox levels, proxy models, temporary VFX, test audio, and batch test configs. The key isn’t these placeholders’ quality (they’re rough), but that they unblock gameplay validation from art pipeline progress. Any gameplay idea can reach a testable state within 24 hours.

Phase 2: Procurement — Don’t Reinvent Wheels

Once gameplay direction is confirmed, source from UE Marketplace / Fab, while reusing existing project assets. Small teams shouldn’t spend resources on problems the marketplace has already solved. The problem isn’t “can’t find assets” — it’s “assets from different sources look like a mashup when placed together.”

Phase 3: Refinement — Turning Mashup Into Cohesion

The most critical phase. AI refinement pipelines turn “manual one-by-one adjustment” into batch processing:

High maturity, directly batch-deployable:

Texture resolution alignment — AI super-res upscaling with linked normal/AO/roughness detail fill
Livery/variant batch generation — color tone, wear level, camo variants in one batch
Naming/directory standardization — AI scripts for batch renaming and auto-categorization

Moderate maturity, human-assisted:

Style unification — AI batch texture repaint for unified color temperature/wear/grime
Material standardization — AI PBR parameter calibration to unified standards
LOD auto-generation, animation retargeting, scale calibration, collision body adaptation

Phase 4: Replacement

Production assets replace placeholders, artists do final polish, ship.

Why This Strategy Beats “Waiting for 3D Generation to Mature”

First, it works today. No technology breakthroughs needed.

Second, it changes development rhythm. The entire pipeline shifts from “wait for assets → make game” to “make game → swap assets.”

Third, it reduces direction gambling costs. Wrong design directions are caught in the placeholder phase — no more spending three months of art capacity to “bet on a direction.”

The four-phase workflow isn’t an “art asset management plan” — it removes “waiting for assets” from the critical path entirely.

VIII. Runtime Experience: AI Director Possibilities and Boundaries

Beyond AI-ifying the development pipeline, extraction shooters also have potential for runtime AI intervention. But to be clear: this part is still in early exploration, unlike the already-deployable capabilities discussed above — this is more directional thinking.

Why Extraction Shooters Are Ideal for AI Director Experimentation

PvE extraction shooter characteristics make them an ideal testbed: no PvP fairness constraints, minimal squad data volume, single-match format requiring no cross-session memory, and the three-act structure naturally suited for phased orchestration.

Why You Can’t Let LLM Directly Control Enemies

This is the key question for understanding AI director architecture.

Runtime scenarios LLM is fundamentally unsuited for:

Combat Tick — damage calculation, collision detection, state updates execute every frame; LLM inference latency (hundreds of milliseconds to seconds) is completely unacceptable
NavMesh real-time control — pathfinding, obstacle avoidance, formation maintenance require millisecond-level response
High-frequency decisions — “should this enemy fire or reload this frame” is far too high-frequency for LLM
Determinism requirements — same input to a behavior tree always produces the same output; LLM doesn’t guarantee this, which is fatal in multiplayer synchronization

Four hard constraints dictate that LLM must operate above frame-level logic: latency, token cost, determinism, debuggability.

Runtime scenarios where LLM truly excels:

Wave Orchestration — “what spawns in the next 30 seconds, from which direction”
Encounter Pacing — “should we increase pressure or give breathing room”
Director Orchestration — “trigger an ambush” or “deploy a friendly distress signal”
Content Mutation — generate different mission/POI/enemy configs each match
Event Injection — trigger dynamic events at the right moments

Key judgment: LLM is unsuited for Combat Tick but excellent for Encounter Orchestration. LLM decides “what to do,” behavior trees decide “how to do it.”

AI Director Architecture

Honestly, What’s Not Solved Yet

Output stability: LLM may generate unreasonable config combinations; extensive constraint rules and output validation needed
Experience quality control: “numerically balanced” and “fun” are two different things
Debugging difficulty: Bad AI director decisions are harder to trace and fix than rule system issues
Cost: Per-match LLM inference server costs need serious accounting

The more realistic current approach: let LLM handle pre-match config generation first (one-time inference, controllable cost), keep mid-match scheduling on traditional rules + weighted randomization. After validating quality and stability, gradually expand LLM’s decision scope. Don’t bet on full AI director from day one.

IX. Summary: What AI Changed, What It Didn’t

What AI Changed

Engineering is fully AI-ready across all pipelines. All 8 pipelines’ engineering side can be deployed at scale. 60-70% reduction in baseline code workload — not replacing engineers, but freeing them from repetitive work to focus on architecture and core systems.

Design shifts from “filling spreadsheets” to “making judgments.” 5-10x efficiency gains. AI eliminates the manual labor in config production, freeing designers’ creative bandwidth.

Development rhythm control has changed. Through the “AI placeholder → marketplace reuse → AI refinement” strategy, validation no longer waits for assets. Want to try three different level layouts? No longer means triple the scene production time — it means three AI whiteboxes, all validated within a week.

What AI Didn’t Change

3D art assets still depend on humans. This won’t change short-term. But as analyzed above, this doesn’t prevent AI from delivering enormous value — the key is choosing the right strategy.

Work requiring “feel” and “judgment” remains human. Level spatial feel, hit feedback tuning, audio layering, art style direction — AI can assist but cannot replace these.

Game design creativity itself hasn’t changed. AI excels at “filling in content given a framework,” not “inventing an addictive core loop.” Whether an extraction shooter is fun depends not on how fast config tables generate, but on core loop design quality.

Greatest Efficiency Levers

Full AI adoption on engineering → The biggest, most certain, immediately deployable lever
Design numerical/config automation → Freeing designers’ creative bandwidth
Art asset reuse + AI refinement → Bypassing 3D generation bottleneck, changing development rhythm
MCP toolification → Establishing technical foundation for runtime AI directors

For the past twenty years, the game industry has increasingly resembled a “content manufacturing industry.” Teams grew larger, pipelines grew heavier, validation grew slower. A gameplay idea could wait months of asset production before being validated.

AI Native game development may be the first time game development returns to an era of “small teams, rapid experimentation.”

The game industry is shifting from “competing on asset production capacity” to “competing on experience validation speed.” The most competitive teams of the future won’t necessarily be those with the most assets, but those who can fastest validate “what’s fun.”

The greatest significance of AI Native development may not be making games “auto-generate,” but enabling teams to discover “what isn’t fun” as early as possible.

AI hasn’t made making games “easy,” but it has made it possible for a small team to build what previously required a large team. That is the true meaning of AI Native game development.