Using AI in Systems Engineering: From Experiment to Insight
How structured pipelines, human validation, and real experiments reveal what AI can actually do
AI has arrived everywhere – at home, in schools, on our smartphones. But at work, it’s becoming more than a personal productivity tool. It’s entering business processes.
But in systems engineering, the question is: can we actually trust it?
I’d been using LLMs for proofreading and brainstorming. But Systems Engineering? That’s different. We’re not writing blog posts, we’re designing safety-critical systems with requirements that must be traceable, testable, and correct.
So when my manager asked “How can AI be used in Systems Engineering?” I was excited. But also sceptical.
Then the industry started talking about “agentic workflows” - autonomous agents that work independently. That sounds impressive. But for Requirements Engineering? When these requirements are used in real-world systems, we need them to be correct, not hallucinated, and complete.
I had no deep knowledge of agentic workflows or AI pipelines. So I did what most engineers would do: I asked Claude to show me what’s possible. Not to build a product. Just to understand: What can AI actually do today? And how does it work?
I started with a simple question: “Can you help me build a pipeline that derives system requirements from customer requirements?” - Seven minutes later, I had a working prototype.
Seven minutes of conversation with Claude - describing what I wanted, iterating on the design, refining the approach. The first thing it built: A pipeline for deriving system requirements from customer requirements, with built-in validation rules.
It looked like this:
1. Requirement Interpretation Agent
2. Structuring Agent
3. Quality Validator
4. Consistency Checker
5. Traceability Matrix Generator
At first, I had no control over the prompts. The pipeline worked, but it was a black box. So I asked for an update: “Let me edit the prompts myself.”
Claude updated the interface immediately. Now I could see every prompt, edit them, inject domain knowledge. This changed the rules I wasn’t just using AI – I was controlling it.
That changed everything.
Generic prompts produce generic output.
But when I could write: “Check if this requirement follows our company’s naming convention for safety-critical systems” - the quality jumped and we are in control.
Then I wanted more. If I can derive requirements, can I generate test cases from them?
I asked Claude to build a test case generator. Same approach: a pipeline that takes system requirements and generates test cases across the V-Model: Unit, Integration, System, Acceptance tests.
Again, I wanted full control over prompts. Again, Claude complied.
Now I had two pipelines that connected:
• UC1: Customer Requirements → System Requirements
• UC2: System Requirements → Test Cases
Seeing that in action made me realize: these aren’t independent tools – they form a system. UC1 creates the requirements that UC2 needs. The validation criteria from UC1 inform the test generation in UC2. The traceability matrix connects them. Changes in one cascade to the other.
This is not the future anymore; we are living it.
Claude built these pipelines in minutes. This is not production-ready by any means, however this is a proof of concept. They show what’s possible today, right now, with technology anyone can access.
Not magic. Not science fiction. Just structured prompts, clear architecture, and an LLM that can follow instructions. The question shifted from “Can AI do this?” to “How do we design this to actually work?” As I mentioned earlier, What I built was a pipeline, not an autonomous agent. That distinction matters.
A pipeline is a sequence of controlled steps. Each prompt is defined, visible, and editable. As engineers, we decide how each step works.
Autonomous agents may produce results, but the process remains opaque. Without visibility into how decisions are made, you cannot debug or systematically improve the system. In safety-critical systems, autonomy without traceability is not innovation – it is risk.
Furthermore, LLMs are non-deterministic. If you run the same prompt twice, you get different answers. That’s how they work – there’s randomness built into the generation process. For creative writing? That’s fine, even desirable. For requirements that are used in safety-critical systems? That’s a big no-go.
Non-determinism is acceptable - as long as control and validation are in place. This is how we can make it work:
1. Human validation is mandatory
It is important to highlight here, engineers review every output. The AI suggests & humans decide. That’s the workflow.
For example: If an AI-generated requirement says: “The system SHALL respond in less than 2 seconds” and the engineer knows the hardware can’t support that - they can change it.
The AI isn’t making final decisions. It’s doing mechanical work that humans then verify.
2. Pipelines give you visibility
Because each step is explicit, you can see exactly what happened:
• Step 1: interpreted the customer requirement
• Step 2: structured it in IEEE 830 format
• Step 3: checked it against IREB criteria
• Step 4: found a potential conflict with SR-042
You can trace the output. You can see where it went wrong. You can fix the prompt at Step 3 if it’s producing false information.
It seems that autonomous agent, cannot give you that degree of visibility. Additionally, how do we know that the agent knows that the hardware doesn’t support 2 seconds response time?
Currently, agents give you a result. You don’t know what path it took. You don’t know what it considered and rejected. You can’t debug it. You can’t improve it systematically.
This shows the bigger picture:
Systems people don’t understand are systems they can’t control. This implies we need to ensure that we build systems that are transparent so we can trust them and use them. The experiment did not prove that AI can replace engineering work. It showed something more important.
The value is not in the AI itself, but in how it is structured.
When AI is treated as a controllable system—broken into steps, with visible logic and human validation—it becomes usable in engineering contexts. Not because it is perfect, but because it is understandable.
That is the shift: AI is no longer just a tool. It becomes part of the system we design.
If we can capture engineering decisions, patterns, and feedback from everyday work and turn them into structured, reusable knowledge, then AI becomes a mechanism for scaling expertise rather than replacing it.

