Part Two of the series · Opening of the practical track. Continues from the overview piece, AI-Native Game Development — From Asset Reuse to Experience Reuse, whose methodology now enters its empirical phase within a real project.
A question that must be answered first
The overview piece systematically surveyed “the stages of game development at which AI can play a role” — retrieving references, translating designs, generating code, tuning values, executing verification, and integrating assets. The argument was complete.
Yet once the argument concluded, one question remained unresolved: Is this methodology merely a notion confined to the demonstration level, or a pipeline that can genuinely be put into operation?
The value of theoretical reasoning is limited. Accordingly, from this piece onward, this “AI-assisted development pipeline” is run once, in full, within a real project, in order to test whether it holds.
The project is defined as follows: atop an existing framework that has already been upgraded to UE5.7 and in which most systems are in place, build a third-person shooter prototype of the extraction-shooter genre, whose combat and orchestration experience is benchmarked against a benchmark PvE shooter. The cycle is three months, led by the developer, with AI assisting.
There is a premise here that must be made explicit — the prototype itself is merely the vehicle. What truly undergoes examination is the AI-assisted development pipeline. This judgment is the central axis of the entire series, and the text below will return to it repeatedly.

Why choose the extraction-shooter genre, and why benchmark against this title
The core of the extraction-shooter genre is a closed loop: insert into the field → scavenge for resources → complete the objective → extract under pressure. Its appeal lies not in any single dazzling system, but in the continuous trade-off between “press deeper or extract safely,” and in the tension formed by that “last stand” during the extraction phase.
The benchmark title, taken as the reference object, pushes the experience above to its limit: the sense of oppression formed by large numbers of enemies closing in on the same screen; the satisfying feedback of calling in support from above via a command sequence; the heavy — rather than nimble — feel of heavy-equipment movement; the shooting feedback in which every hit carries weight; and one critical premise — it is a game of four-player cooperation, in which solo play is difficult to sustain under its mechanics.
What this project sets out to reproduce is this “felt experience,” not a verbatim copy of its system implementation. This delineation determines what AI should, and should not, undertake in this project.
The first task before any hands-on work: complete a systematic inventory
Many people’s notion of “making a game with AI” is to open the editor and immediately have AI generate. This project’s first step is the opposite — first have AI thoroughly survey this framework and clarify which systems are already in place, which need to be supplemented, and which must be built from scratch.
This is precisely one of the most underrated of AI’s capabilities: code archaeology. The conclusions of the inventory make the overall landscape clear at a glance:
- Largely in place (a matter of adjusting configuration): weapon damage (data-table-driven), scavenging and pickup, extraction and insertion, basic character controls
- Partially in place (a key link to be supplemented): the camera (first-person is complete; the first ↔ third-person switch is missing), enemy AI (behavior trees and crowd management are complete; enemy waves are missing)
- Accumulated but not yet connected: the asset pipeline for the target source material (the toolchains for unpacking, material reverse-engineering, and level reverse-engineering are all mature, but not yet integrated into the framework)
Only three things genuinely need to be built from scratch: the first ↔ third-person view switch, the stratagem call-down system, and the enemy-wave system. Of these, the stratagem call-down is the only entirely new gameplay system — the rest are reuse, tuning, and asset integration.
This judgment established the priorities of the entire plan: effort is not invested in reinventing the wheel, but is concentrated on these three items and on “tuning out an accurate experience.”
🔧 Design Retrospective · Why inventory first, then act
Why this design: The greatest risk of starting work without an inventory is duplicated construction — spending two weeks implementing a feature that is already ninety percent complete within the framework. By first having AI clarify the present state, effort can be directed toward the genuinely empty spaces.
Where it bit us: An inventory is not a matter of having AI “read through the code and offer a problem-free conclusion.” It must produce a present-state list that can be grounded in specific call chains and specific data-table fields; otherwise it amounts to having taken no inventory at all.
How the traditional approach falls short: For a person to read through hundreds of thousands of lines of unfamiliar code in order to clarify a framework is a workload measured in weeks. Code archaeology compresses this to a workload measured in days — and this is the precondition for fitting the entire plan within a three-month cycle.
A judgment derived from the inventory: multiplayer is not a burden, but an existing dividend
One finding from the inventory directly revised the workload estimate.
The core of the extraction-shooter genre is cooperation. A common expectation is that, within a three-month solo prototype cycle, implementing multiplayer would cause the workload to spiral out of control: state synchronization, property replication, multi-client consistency — these are typically the difficulties that consume a development cycle.
A survey of the framework’s network layer led to a different conclusion: this is a production-grade multiplayer shooter framework. Within a single core module alone there are nearly two hundred network-replication declarations; character movement, properties, and state machines are all designed around the replication mechanism.
This means the positioning of multiplayer shifts from “whether to do it” into an entirely different proposition: it is not “building the network layer,” but “reusing the existing network layer, while avoiding implementing new systems as single-player logic that is not multiplayer-ready.”
For this reason, multiplayer was not set up as a separate “multiplayer milestone” — that would lead to multiplayer being integrated last, with all earlier systems implemented in a single-player manner and then reworked. It is instead set up as a cross-cutting principle: from the very first milestone, all new systems are designed in a multiplayer-ready manner, with state following the existing replication paths. Verification is carried out directly in the form of multi-player cooperation.
So what kind of constraint, specifically, is “designed in a multiplayer-ready manner”? It is not abstract; grounded in code, it amounts to several explicit guidelines.
Consider one of the most error-prone counter-examples. In the stratagem call-down, the player inputs a command sequence, throws a beacon, and seconds later an orbital strike descends from above. The not-multiplayer-ready implementation is: the client detects the beacon’s landing locally, spawns the orbital strike locally, and resolves the damage locally — it runs smoothly in a single-player environment, but breaks the moment multiplayer is introduced: each client computes on its own, and the landing point, the damage, and even “whether that support arrived at all” as seen by teammates may all be inconsistent. The multiplayer-ready implementation is: the beacon throw is treated as a request; the server determines the landing point and timing, the server spawns the support entity, the server resolves the damage, and then the results are replicated to all clients. The presentation layer (projectile trajectory, screen shake, sound effects) may be rendered locally, but any determination that affects game state must be vested in a single authoritative end.
The same guideline runs through every system: who determines an enemy’s health and death, who advances mission progress, whose clock governs the extraction countdown — the answer is always “server-authoritative, client-presentational.” This is not a profound architecture, but rather a default habit at coding time: for each newly added field that will alter game state, first confirm “whether it needs to be replicated, and who modifies it.”
This is precisely where AI assistance can play the greatest role, and also where it is most prone to error. Where it plays a role: AI is able to understand the usage of the framework’s existing replication macros and, following the established paradigm, wire replication into the state fields of new systems. Where it errs: absent human oversight, AI-generated code will tend toward single-player logic of “modify locally and it just runs” — because within the tendencies formed by its training, the single-player implementation is shorter and feels more “self-consistent.” For this reason, along the multiplayer track, the human’s responsibility is to continually ask “is this state consistent in a multiplayer environment.”
This is treated separately because it points to a more general judgment — in AI-assisted development, what is most irreplaceable about the human is often not writing code, but holding those systematic constraints that lie outside AI’s field of view. AI excels at giving a swift and accurate solution within a delimited local scope; but constraints such as “this code must hold in a multiplayer environment” and “this field will be observed by four clients simultaneously” are not within its current context, and are therefore difficult for it to take into account. Multiplayer is merely the most typical example of such a constraint. Across the entire prototype, the real center of gravity of the work lies not in “getting AI to produce code,” but in “guarding, on AI’s behalf, the boundaries it cannot see.” This is precisely one of the core matters the series is to validate: whether this division of labor — the human holds the constraints, AI produces the solutions — can operate stably.
🔧 Design Retrospective · Why multiplayer is a principle rather than a milestone
Why this design: To place multiplayer as a standalone phase at the end is tantamount to tacitly permitting all earlier systems to be implemented in a single-player manner first and then retrofitted — the arrangement with the highest rework cost.
Where it bit us: The initial framing nearly handled it as a standalone “to be revisited later” module. The true risk lies not in the network layer (which is already in place), but in whether new systems get casually implemented as non-replicable single-player logic. Bringing it forward as a principle is precisely what averts such rework at the source.
How the traditional approach falls short: The traditional practice often treats multiplayer enablement as a dedicated “retrofit project.” When the underlying network layer is already mature, the more economical approach is to let it recede into invisibility — folded into the design constraints of every system, rather than advanced as a separate initiative.
The route: first connect what is “playable,” then enrich what is “replayable”
The order of progression across the thirteen weeks was designed, not arranged in numerical sequence.
Stage One · Initiation + three preliminary validations (W0-1). Confirm that the framework can complete a full-project build and can launch the editor; at the same time, conduct preliminary validation on the three least certain pipelines — assets (can they be unpacked and integrated), network (can multiplayer stand up a server and run), and mission/HUD (the degree to which they are already in place). Expose the high-risk items in the very first week, rather than dragging them to the end only to discover unworkability.

Stage Two · View overhaul (W1-3, first deliverable). Overhaul the first-person view into a third-person view. This is precisely the first empirical test of the overview piece’s argument about “reworking an FPS into a TPS.”

Stage Three · Movement and gunplay feel (W4-5). The heavy feel of heavy-equipment movement, and solid shooting feedback.

Stage Four · Stratagem call-down (W6-7, built from scratch). Calling in support from above via a command sequence — this is the most iconic gameplay of the benchmark title, and also the only gameplay system in this project built from scratch. It is relatively self-contained and does not depend on enemies, so it is deliberately brought forward ahead of the enemies, in order to validate the complete chain of “input → beacon → descent from above” in an empty-field environment.
It warrants noting separately: this is the only genuinely 0-to-1 new system in the entire prototype — the remaining milestones either rework systems already in place or make increments on an existing architecture; this one alone begins from a blank sheet. For this very reason, it will become the litmus test of whether “AI-assisted from 0 to 1” holds. In the aforementioned “reuse and tuning” links, AI’s value is relatively predictable; but faced with a system for which there is no precedent within the framework, can AI provide support all the way from design translation through to code generation — this is the project’s greatest unknown, and the key indicator to observe in this phase is how much support AI can actually provide in a situation where “there is no ready paradigm to follow.”

Stage Five · Monsters (W8-9, built from scratch). The readable threat of a single enemy, and the swarming pressure of the crowd.

Stage Six · Loop + missions + multiplayer (W10-11). Integrate the components into one complete, playable round of the extraction-shooter loop. This stage juxtaposes three items because they are inherently three inseparable facets of “one round”: the loop skeleton, the mission system, and multi-player cooperation.

Stage Seven · Replayability polish + asset integration (W11-13). Tune out the driving force of “one more round,” and integrate the assets of the benchmark title in full. Meanwhile, UI/HUD does not belong to any single week — it runs throughout, supplemented progressively alongside each system: absent clear conveyance of information, however complete a system may be, the player has no way to perceive it.

Of the three items in Stage Six, the one most worth treating separately is the mission system
Loop, missions, and multiplayer are juxtaposed in the same stage not because they are trivial enough to bundle together — quite the opposite: they are three indispensable facets of “one round.” Among them, the mission system is the one most easily regarded as an “ancillary feature,” yet the one that least ought to be treated as such.
First, a counter-intuitive fact: a superficially complete extraction-shooter loop may be entirely idling.
Insert, scavenge, engage, extract — chained together, these four steps do allow the player to walk through from beginning to end. But absent “what the objective of this round is,” it amounts to a machine running in place: the player invests a great deal of action, yet has no reason whatsoever to venture into the deeper, more dangerous areas of the map. A rational player will arrive at an optimal strategy that dulls the experience — land, scavenge one round nearby, and immediately extract. The risk is lowest, the reward acceptable, and the loop still closes. But the round is devoid of tension, because no mechanism prompts the player to take on risk.
The mission system is precisely the mechanism that prompts the player to take on risk. It sets a primary objective for the round — destroy a facility, collect a sample, upload a piece of data — and that objective happens to lie deep in the map, and happens to take time. Once it is introduced, the player’s decisions acquire substance: is it worth venturing into danger to complete the primary objective? Is the extra reward offered by a side objective worth lingering two more minutes and enduring one more enemy wave?
More critical still is the way it couples with extraction. Extraction should not be an escape route available at any time, but should be constrained by the completion of the primary objective. To extract without completing the primary objective counts the round as only half-achieved; only after completing the primary objective is the extraction point truly activated, or are the extraction conditions improved. This coupling welds “extraction shooting” from three independent actions into a complete narrative with an initial motive, a climax of process, and a cost — the player enters with a mission, completes it amid the enemy waves, and then extracts amid even fiercer waves. The tension of the “last stand” springs precisely from this: what the player defends is not only the resources scavenged, but the objective hard-won in this round.
Once it is established as the driving core, the implementation path becomes clear instead. In the prototype stage there is no plan to stack a large number of mission types — one or two are appropriate for validating the experience, for example “destroy a facility” and “collect a sample.” But the missions are implemented as data-driven: objective type, objective location, completion conditions, and the coupling relationship with extraction are all abstracted into configuration, rather than hard-coded into a particular level. In this way, once the experience has been validated, extending to more mission types is merely a matter of configuration, requiring no return to modify the core loop.
This line of thinking — “validate the experience first, then scale by configuration” — in fact runs through the entire plan; stratagem varieties, enemy types, and the wave curve all follow this path. Behind it lies a plain judgment: what the prototype stage must validate is “whether this experience holds,” not “whether the content is sufficient.” Content is the part worth scaling only after the experience holds. AI’s role here is to assist in designing well the configuration structures of these systems and in populating the first batch of data; the final judgment of “whether this experience holds” still belongs to the developer.
This is precisely why it is called the driving core rather than a feature. A feature is a component that can be appended after the fact; the driving core determines the direction in which the entire system exerts its force.
🔧 Design Retrospective · The mission system: a driving core re-extracted from the “feature list”
Why this design: The loop skeleton (insert → scavenge → engage → extract) appears complete, yet lacks the link of “why enter this round.” Without an in-round objective, the extraction-shooter loop degrades into a mere mob-grinding arena. The mission system is what drives the player deep into the map and forms the tension of “complete the objective, then extract.”
Where it bit us: It did not exist in the initial plan. Reviewing the loop in hindsight exposed the problem — it was a loop lacking an initial motive. It should not be an ancillary feature of some milestone, but should couple directly with the “extraction conditions.”
How the traditional approach falls short: Treat the mission as “one more feature,” and the loop is a machine idling in place. Treat it as the driving core, and the loop acquires direction.
The kernel: what AI is delineated to undertake in this pipeline
This is what truly undergoes examination. But before discussing what AI undertakes, one must first lay out a way of decomposition that runs through the entire process — it determines where AI’s leverage actually lands.
Every experience goal is broken down into two parts: function and values. Function is the skeleton: fire can land a hit, a beacon can summon, enemies can advance in swarms — it resolves “whether it exists.” Values are the feel: how strong the recoil is, how long the TTK runs, along what curve the waves ramp up — they resolve “whether it is right.” Once the function is in place, what truly decides whether the experience holds together lies almost entirely on the values side.
The key realization: the value tuning here is experience-oriented in nature, not engineering-oriented. Shifting the recoil curve from A to B looks like changing a single config entry, but in substance it answers “does this gun feel solid enough to fire”; raising wave density by thirty percent looks like changing a spawn parameter, but in substance it answers “is the suffocating sense of being overwhelmed strong enough.” Values are not cold configuration — they are the experience itself, in its quantified form. Precisely for this reason, the act of tuning values cannot be handed off to AI entirely — because “how far to tune before it’s right” is an experiential judgment.
This way of decomposition is exactly the shared premise of the three divisions of labor below: on the “function” side AI can take over a great deal of the work, while on the “values” side it can only assist in closing the gap, with the final call left to the human.
🔧 Design Retrospective · Why “values are experience,” not “values are configuration”
Why this design: Decomposing the experience into function and values is meant to bring the elusive question of “does the feel hold up” down onto operable objects. The values side is singled out for emphasis because it is the most easily mistaken for a purely engineering parameter — and once that happens, tuning degrades into “settling on a value that does not throw an error,” and the experience falls out of the picture entirely.
Where it bites: The danger of values is that they look too much like ordinary configuration, and are therefore all too easily brushed off by “filling in a plausible value in passing.” But behind TTK, spread, and the wave curve stands the player’s actual perception; fill in one wrong value, and the skeleton still runs while the experience has already collapsed — and that collapse throws no error, it can only be sensed by a human.
How the traditional approach falls short: The traditional division of labor often severs “values design” from “engineering implementation,” leaving values reduced to isolated entries in a spreadsheet. Only by explicitly anchoring values as “the quantified form of the experience” does tuning gain a basis for judgment — what it measures against is not some technical metric, but a concrete experience goal.
First, reference alignment comes first. For the “feel” links such as view, movement, and shooting: the developer examines the benchmark and determines the direction, and AI is responsible for distilling hard-to-articulate sensations such as “heavy” and “solid” into a tunable parameter baseline table — arm length, shoulder offset, recoil curve, acceleration — and then tuning against the baseline. AI’s value here is to convert subjective feel into objective, tunable values.
Second, machine collects data, human evaluates. For the “verification” links such as enemies, oppression, and balance: the machine is responsible for running, instrumenting and collecting data, and organizing the data into curves — whether the reaction window is sufficient, how long one stands before being killed, where the pressure peak occurs. But the authority over the judgment of “whether the experience is accurate” always belongs to the developer. AI does not score.
This point warrants emphasis, because it is the most prone to being executed with a drift. One common notion is: simply have AI watch a recording and score it automatically. This project deliberately does not adopt this approach. “Whether this enemy carries a sense of threat” and “whether this wave’s sense of oppression is accurate” are experience judgments, not data judgments. What AI is tasked with is to convert a fuzzy experience into objective data the developer can review, so that the judgment has a basis; but the final decision is made by the human.
This holds the central axis of the entire series: AI assists, the human decides. The verification link is no exception.
Third, when building from scratch, emphasis falls on function and presentation. For an entirely new system such as the stratagem call-down: the emphasis is on the system being functionally complete and the call-down carrying satisfying feedback, with AI’s code generation receding to an auxiliary position — it assists in swiftly converting design into code, but “what form this system should take” is determined by the developer.
🔧 Design Retrospective · Is “AI does not score” a regression or a step forward
Why this design: Handing the scoring authority to AI appears more automated and more advanced. But there is no objective standard answer to whether an experience is good or bad; having AI score is no different from handing a question that has no standard answer to a respondent that will fabricate an answer in all earnestness.
Where it bit us: A more aggressive idea would be to have AI “watch the replay and assess the emotion curve.” The conclusion is that it can reason from the exported data series, but “watching the entire video stream on its own and scoring the experience” is neither feasible nor proper. Rather than package it as fully automated, it is better to draw the boundary honestly.
How the traditional approach falls short: The traditional practice is to invest manpower in conducting surveys and focus groups to measure the experience. AI does not replace this judgment, but it automates the laborious link of “collecting data and organizing curves,” so that the human’s judgment comes faster and with more basis.
Points not yet certain (listed in advance)
As is the custom, the matters that remain uncertain at present are listed:
- Whether the framework can complete a full-project build, and whether it can launch the editor — at present only some modules are confirmed to compile, and “full-project operability” has not yet been tested in practice. This is the foremost matter to resolve in Stage One, and may also constitute an obstacle in the very first week.
- How effective the “parameter baseline table” distilled by AI actually is — this is the crux of the entire “reference alignment” line of thinking. If the baseline deviates from reality, the value of this pipeline must be discounted.
- Whether the “replicable design” of multiplayer harbors hidden hazards throughout — the network layer being in place does not equate to new systems remaining consistent once integrated. Multi-client state consistency must be confirmed through actual operation.
- The numeric values in the acceptance criteria (the hit-time interval, the duration of a single round) are all placeholder values at present, and their true values will only be known once the actual output exists.
These uncertainties are precisely the reason this series is worth recording — if everything were already settled, there would be no need for validation.
What follows
This piece is the project initiation note. From the next piece onward, it will progress with the development schedule: practice first, then record. Completion will be recorded faithfully as completion; obstruction will be recorded faithfully as obstruction, including the links at which AI failed to provide effective help.
What this series sets out to validate has never been “the ceiling of AI’s capability,” but rather — whether a single developer, with the aid of AI, can advance a prototype from an existing framework to a playable state.
This answer remains undetermined; the process of validation is itself the value of this series.
Whether the judgments established in the overview piece — that AI can retrieve references, translate designs, and tune values, while holding the bottom line of “the human decides” — hold true, three months from now, this prototype will provide the answer.
(This article is a project initiation note for the series. All game names referenced are anonymized throughout. Subsequent practical pieces will be produced in step with the development schedule.)




























