Need help with your APIs? I offer API discovery, governance & evangelism services. Explore services →
API Evangelist API Evangelist
Learnings
Guidance
Toolbox
Alignment
API Evangelist LLC

API Evangelist Conversation with Clemens Vasters on JSON Structure and Bringing Sanity to Schema, Metadata, and AI

with Clemens Vasters , Principal Architect, Messaging & Real-Time Intelligence at Microsoft
June 30th, 2026

Clemens Vasters is a Principal Architect at Microsoft who has spent twenty years building the industrial-grade messaging and eventing backbone of Azure — Service Bus, Event Hubs, Event Grid, Stream Analytics, the relay, and now Microsoft Fabric Eventstreams — while representing Microsoft across messaging standards like AMQP, MQTT, CloudEvents, and xRegistry. In this conversation we dig into JSON Structure, the strictly typed data definition language he created and took to the IETF as an answer to the long, problematic history of JSON Schema. Clemens explains why JSON Schema is a fine validation language but a lousy data definition language, how he cut the dangerous parts — unconstrained any-of/all-of composition and scatter-shot dollar-ref — to leave a clean core with a real type system. We get into metadata as the missing ingredient for better LLMs, the SDK and Avrotize tooling ecosystem, how he tamed reference complexity with a two-step import-then-ref model, extension mechanisms for governance and policy, and why he believes a good spec eventually earns its adoption.

Conversation

Who are you and what do you do?

My name is Clemens Vasters, and I’m an architect at Microsoft. For twenty years I’ve worked on numerous standards — MQTT, AMQP, OPC-UA in the industrial manufacturing vertical, CloudEvents and recently xRegistry in the CNCF, and now in the IETF on something called JSON Structure. On the product side I’m the architect for Microsoft Fabric Eventstreams, which is the real-time pipeline inside the Fabric data platform, and on Azure for Event Hubs, Service Bus, Event Grid, Stream Analytics, and the relay. I’ve also co-invented services like Notification Hubs and IoT Hub that are now in the hands of other teams. In terms of scale, we crossed the line of doing about twenty trillion transactions per day on those services — it’s enormous. When you see how much growth there is in cloud revenue, that mostly translates into actual transaction growth. So I’ve done a lot at Microsoft in twenty years.

What has your journey at Microsoft looked like?

I live in western Germany, about a fifteen-minute drive to the Dutch border, and I do all my work remotely — my team is in the US, though Microsoft has grown and we now have people in Prague, Warsaw, all around Europe. In the early 2000s I had my own company, and we taught a substantial portion of the Microsoft field worldwide how to use .NET, because the field was very unprepared for that revolution going from VB6 and C++ to this new language. We wrote a large curriculum for it. Eventually they convinced me it would be a good idea to join the Windows Communication Foundation team — the Indigo team at the time. Then what is now Azure Relay started in 2006 when I was just there, I joined that effort, and I can basically draw a straight line from that incubation to where we are now. On Sunday it was my twentieth anniversary.

What is JSON Structure?

Let me tell the origin story. When you work with JSON, JSON Schema is naturally the thing you fall into, and so did we. We were building xRegistry, a schema and endpoint registry, and inside Microsoft Fabric we needed a typed experience — we’re building a data funnel whose end is a data lake or a database table, so I need to take documents and land them in columns and rows in a way a query engine, and ultimately something like Copilot, can understand. As we got serious about building tooling on JSON Schema, we ran into the problem everybody runs into. Look at OpenAPI leaning on JSON Schema — you won’t find any code generator that doesn’t give up at some point, because JSON Schema is good at defining the shape of a document, but it’s lousy as a data definition language. It’s really a validation language. So JSON Structure takes the familiar shape of JSON Schema, cuts out everything causing the complication and harm, and gives you a proper type system with a well-defined mapping into JSON.

Do you see JSON Structure bringing some sanity to AI?

It all starts with a good spec. There are three layers. First, make the tool understand what the schema language and the type system are — JSON Schema’s foundational spec isn’t very good, so the knowledge an LLM draws on is secondary sources, not the primary source. The way I use JSON Structure is I stash the specs into the context, or give my agent links so it can reach for them, and the specs are written so agents do pretty well with them. Second, because I have a clear, well-defined type system — an int32, a decimal with precision and scale, a datetime — plus companion specs for alternate names, scientific units, and currencies, you get a clearer picture. If you give an LLM a field called temperature typed as number, what is that — Fahrenheit, Celsius, Kelvin? You can’t know. You need a description, and ideally a formal way of defining units. Then on top you put the data, with a clear link to the schema that governs it.

What is the role of metadata?

We’re now living in a world where LLMs write code, interpret information, and answer questions directly. But if we don’t give the LLM information about the semantics of data — if we don’t enrich that data with enough context — they will make more mistakes. The more context you provide, the better the outcome. So metadata in the age of AI plays a huge role, and having a language that can describe that data well, with a good spec, is very helpful. This is why JSON Structure is tightly coupled with my open source Avrotize work doing schema translations between everything — metadata translation is just as important as data translation, so you don’t lose that information. The goal is that you can take an XML Schema, translate it into JSON Structure capturing all the semantics including relationships, translate it further, and have code read it back out with no loss of fidelity. JSON Structure is designed to act as a hinge.

How can you extend JSON Structure?

If you want your own set of extra attributes — I’m not going to call it vocabulary, that’s a little too metadata-geeky — you can write an extension spec and refer to it with a dollar-uses clause. So in your schema you have a clear declaration that this schema is using the following extra spec, and that extension then lets you put extra attributes onto your definitions. One of the complications with vocabularies in JSON Schema is that they’re a completely foreign, external thing not expressed in the schema language itself, which is a bit strange. I wanted to make sure these extension specs are something you express in the same language, to keep it relatively simple. Throughout, I’ve tried not to be a metadata geek — my primary focus is that a normal business application developer can read the core spec, understand it, and describe the street-address object they need, while still having features the metadata folks might like.

How did you approach the dollar-ref complexity?

The dollar-ref construct in JSON Schema can occur anywhere, which causes all kinds of complexity — relative paths, absolute paths, cross-network references, authentication, access control, caching — all of it basically thrown in your face with nothing defined around it. I’ve made dollar-ref illegal in JSON Structure except for cross-referencing reusable types, which must live in the definitions section — I stuck to the JSON Schema draft-7 convention there. If you want to bring in another file, there’s an import extension. It’s a two-step process: you import the entire document whole, map it into a namespace to disambiguate it, and only then can you pick something out of it with dollar-ref. It’s exactly like modern programming languages — you don’t point at a file and a line and compile that piece in, you load a package and use the types in it. Dollar-import plus dollar-ref is that same module concept, and all the path ambiguity completely goes away.

How can JSON Structure be used by other specifications?

The baseline use case is the same bread-and-butter thing as JSON Schema: define the request and response message in OpenAPI — the fields, the parameters — or define a database table with types that map cleanly into the database. What was important to me is that this looks exactly the same as it does today. If you look at an OpenAPI definition, it’s already mostly JSON Structure because the shape is the same. But if the OpenAPI team picked up JSON Structure, all of a sudden they’d get a better type system and all the further mechanisms they could enable. The spec is effectively an offer to the world — a core plus six extension specs, with a set of SDKs and tooling around it. We’re using it because we built it to solve our own problems, and anybody else who wants to use it is invited. I put it into the IETF, even just as a draft, so nobody needs to worry that I’ll do a rug-pull as the evil Microsoft guy.

What is the tooling ecosystem for JSON Structure?

I’ve built an SDK, and it’s actually a proof point that a good spec helps — it would have taken half a year to build two years ago, but it took two weeks, because I have digital assistants now that can code and they’re really good when you give them a spec and validated examples. The SDKs are in about eight languages — C#, C++, Rust, Python, Perl, TypeScript, JavaScript — and they’re all already distributed on the package managers, so you can get the JSON Structure package from PyPI and the rest. They let you validate a schema and validate a document against a schema, and they understand the core spec and the extensions. Then there’s Avrotize, written in Python, which translates between schemas and has code generators. You put a schema in and you don’t get one ginormous file you can’t look at — you get a neatly organized project that compiles out of the box, with a README, documentation, and your descriptions landing as comments on the fields. The objects are self-serializing, and the C# generator can also do CBOR, MessagePack, XML, and Protobuf — so one class can be a hinge between all those formats.

What is the relationship between the tooling and the spec?

There’s also a question of spec quality that I’ve frankly been annoyed by. You’ll find very few people who actually learn JSON Schema from the specifications, because the specifications just aren’t very good — you start reading and you’re halfway in and you still don’t know what you’re looking at. So I wanted JSON Structure to keep the familiar shape, the part that’s easy — type, object, properties, a map of names to types — because everybody in software development knows that. The easy traps come with the conditional composition constructs, any-of, all-of, one-of, which let you very quickly build something that’s no longer representable in a database or a programming language because there are no guardrails. So I built the tooling and the spec together, and the fact that the assistants do well with the spec is itself the evidence that writing it clearly pays off. The SDK existing in two weeks instead of half a year is the proof that a good, clear spec is what makes the whole tooling story possible.

How do you manage schema at scale?

In the core spec you are just as constrained as you are with Avro — there are no external references, so you can rely on the fact that a core schema is entirely self-contained, which is a quality unto itself. When you do need to reach across boundaries — your team’s schema, common schemas from your domain, industry standards for banking and so on — there’s the import extension. You say you’re importing this file, it resides over here, and you map it into a namespace inside your document; then all the references you make point into that mapped document. It’s a two-step process, not “resolve a file and point straight into some particle of it.” That makes referencing much easier because the ambiguity of what path a particular type needs to live at, depending on where you reference it from, completely goes away. And then xRegistry, the work we’re doing in the CNCF, is how you manage versions of these schemas in a registry in a standardized way, either as documents or as APIs. It all ties together — Avrotize, the SDKs, the spec, and the registry.

What does policy management look like across schema?

For streaming data there’s an interesting challenge: how do you maintain the speed of a pipeline while making sure no data that shouldn’t be there gets through? You need to move filtering into the pipeline, with tooling that can look at what might pass and govern it — one subscriber may get the data, another may not. The way you do that is you write schemas and annotate them, hanging rule sets off types, and that requires a schema language that allows those extensions. That’s in the back of my mind — it’s not product reality yet, but it’s certainly possible with the foundation JSON Structure lays, because it allows arbitrary extensions at any point where you could anchor those rule sets. We already have tooling at Microsoft like Purview for classifying data in databases, and Apache Atlas does similar metadata work. The deeper point is that most Kafka systems work today only because the publisher and consumer are the same people, or at least in the same room. Once they’re not — and the other side may not even be human — you need a neutral ground to communicate the data needs through metadata, with strong tooling that can pick it up.

Clemens Vasters
Clemens Vasters
Principal Architect, Messaging & Real-Time Intelligence at Microsoft

Clemens Vasters is a Principal Architect for Messaging & Real-Time Intelligence at Microsoft, where over twenty years he has helped build and operate the company's hyper-scale messaging services — Service Bus, Event Hubs, Event Grid, Stream Analytics, the Azure Relay, and Microsoft Fabric Eventstreams — and co-invented services like Notification Hubs and IoT Hub. Based in western Germany and working remotely with a team across the US and Europe, he represents Microsoft in messaging standardization across OASIS (AMQP, MQTT) and the CNCF (CloudEvents, xRegistry). He is the creator and author of JSON Structure, a strictly typed data definition language being developed as an IETF Internet-Draft, along with its SDKs and the Avrotize schema-translation tooling.