Question 1

Who are you and what do you do?

Accepted Answer

My name is Clemens Vasters, and I'm an architect at Microsoft. For twenty years I've worked on numerous standards — MQTT, AMQP, OPC-UA in the industrial manufacturing vertical, CloudEvents and recently xRegistry in the CNCF, and now in the IETF on something called JSON Structure. On the product side I'm the architect for Microsoft Fabric Eventstreams, which is the real-time pipeline inside the Fabric data platform, and on Azure for Event Hubs, Service Bus, Event Grid, Stream Analytics, and the relay. I've also co-invented services like Notification Hubs and IoT Hub that are now in the hands of other teams. In terms of scale, we crossed the line of doing about twenty trillion transactions per day on those services — it's enormous. When you see how much growth there is in cloud revenue, that mostly translates into actual transaction growth. So I've done a lot at Microsoft in twenty years.

Question 2

What has your journey at Microsoft looked like?

Accepted Answer

I live in western Germany, about a fifteen-minute drive to the Dutch border, and I do all my work remotely — my team is in the US, though Microsoft has grown and we now have people in Prague, Warsaw, all around Europe. In the early 2000s I had my own company, and we taught a substantial portion of the Microsoft field worldwide how to use .NET, because the field was very unprepared for that revolution going from VB6 and C++ to this new language. We wrote a large curriculum for it. Eventually they convinced me it would be a good idea to join the Windows Communication Foundation team — the Indigo team at the time. Then what is now Azure Relay started in 2006 when I was just there, I joined that effort, and I can basically draw a straight line from that incubation to where we are now. On Sunday it was my twentieth anniversary.

Question 3

What is JSON Structure?

Accepted Answer

Let me tell the origin story. When you work with JSON, JSON Schema is naturally the thing you fall into, and so did we. We were building xRegistry, a schema and endpoint registry, and inside Microsoft Fabric we needed a typed experience — we're building a data funnel whose end is a data lake or a database table, so I need to take documents and land them in columns and rows in a way a query engine, and ultimately something like Copilot, can understand. As we got serious about building tooling on JSON Schema, we ran into the problem everybody runs into. Look at OpenAPI leaning on JSON Schema — you won't find any code generator that doesn't give up at some point, because JSON Schema is good at defining the shape of a document, but it's lousy as a data definition language. It's really a validation language. So JSON Structure takes the familiar shape of JSON Schema, cuts out everything causing the complication and harm, and gives you a proper type system with a well-defined mapping into JSON.

Question 4

Do you see JSON Structure bringing some sanity to AI?

Accepted Answer

It all starts with a good spec. There are three layers. First, make the tool understand what the schema language and the type system are — JSON Schema's foundational spec isn't very good, so the knowledge an LLM draws on is secondary sources, not the primary source. The way I use JSON Structure is I stash the specs into the context, or give my agent links so it can reach for them, and the specs are written so agents do pretty well with them. Second, because I have a clear, well-defined type system — an int32, a decimal with precision and scale, a datetime — plus companion specs for alternate names, scientific units, and currencies, you get a clearer picture. If you give an LLM a field called temperature typed as number, what is that — Fahrenheit, Celsius, Kelvin? You can't know. You need a description, and ideally a formal way of defining units. Then on top you put the data, with a clear link to the schema that governs it.

Question 5

What is the role of metadata?

Accepted Answer

We're now living in a world where LLMs write code, interpret information, and answer questions directly. But if we don't give the LLM information about the semantics of data — if we don't enrich that data with enough context — they will make more mistakes. The more context you provide, the better the outcome. So metadata in the age of AI plays a huge role, and having a language that can describe that data well, with a good spec, is very helpful. This is why JSON Structure is tightly coupled with my open source Avrotize work doing schema translations between everything — metadata translation is just as important as data translation, so you don't lose that information. The goal is that you can take an XML Schema, translate it into JSON Structure capturing all the semantics including relationships, translate it further, and have code read it back out with no loss of fidelity. JSON Structure is designed to act as a hinge.

Question 6

How can you extend JSON Structure?

Accepted Answer

If you want your own set of extra attributes — I'm not going to call it vocabulary, that's a little too metadata-geeky — you can write an extension spec and refer to it with a dollar-uses clause. So in your schema you have a clear declaration that this schema is using the following extra spec, and that extension then lets you put extra attributes onto your definitions. One of the complications with vocabularies in JSON Schema is that they're a completely foreign, external thing not expressed in the schema language itself, which is a bit strange. I wanted to make sure these extension specs are something you express in the same language, to keep it relatively simple. Throughout, I've tried not to be a metadata geek — my primary focus is that a normal business application developer can read the core spec, understand it, and describe the street-address object they need, while still having features the metadata folks might like.

Question 7

How did you approach the dollar-ref complexity?

Accepted Answer

The dollar-ref construct in JSON Schema can occur anywhere, which causes all kinds of complexity — relative paths, absolute paths, cross-network references, authentication, access control, caching — all of it basically thrown in your face with nothing defined around it. I've made dollar-ref illegal in JSON Structure except for cross-referencing reusable types, which must live in the definitions section — I stuck to the JSON Schema draft-7 convention there. If you want to bring in another file, there's an import extension. It's a two-step process: you import the entire document whole, map it into a namespace to disambiguate it, and only then can you pick something out of it with dollar-ref. It's exactly like modern programming languages — you don't point at a file and a line and compile that piece in, you load a package and use the types in it. Dollar-import plus dollar-ref is that same module concept, and all the path ambiguity completely goes away.

Question 8

How can JSON Structure be used by other specifications?

Accepted Answer

The baseline use case is the same bread-and-butter thing as JSON Schema: define the request and response message in OpenAPI — the fields, the parameters — or define a database table with types that map cleanly into the database. What was important to me is that this looks exactly the same as it does today. If you look at an OpenAPI definition, it's already mostly JSON Structure because the shape is the same. But if the OpenAPI team picked up JSON Structure, all of a sudden they'd get a better type system and all the further mechanisms they could enable. The spec is effectively an offer to the world — a core plus six extension specs, with a set of SDKs and tooling around it. We're using it because we built it to solve our own problems, and anybody else who wants to use it is invited. I put it into the IETF, even just as a draft, so nobody needs to worry that I'll do a rug-pull as the evil Microsoft guy.

Question 9

What is the tooling ecosystem for JSON Structure?

Accepted Answer

I've built an SDK, and it's actually a proof point that a good spec helps — it would have taken half a year to build two years ago, but it took two weeks, because I have digital assistants now that can code and they're really good when you give them a spec and validated examples. The SDKs are in about eight languages — C#, C++, Rust, Python, Perl, TypeScript, JavaScript — and they're all already distributed on the package managers, so you can get the JSON Structure package from PyPI and the rest. They let you validate a schema and validate a document against a schema, and they understand the core spec and the extensions. Then there's Avrotize, written in Python, which translates between schemas and has code generators. You put a schema in and you don't get one ginormous file you can't look at — you get a neatly organized project that compiles out of the box, with a README, documentation, and your descriptions landing as comments on the fields. The objects are self-serializing, and the C# generator can also do CBOR, MessagePack, XML, and Protobuf — so one class can be a hinge between all those formats.

Question 10

What is the relationship between the tooling and the spec?

Accepted Answer

There's also a question of spec quality that I've frankly been annoyed by. You'll find very few people who actually learn JSON Schema from the specifications, because the specifications just aren't very good — you start reading and you're halfway in and you still don't know what you're looking at. So I wanted JSON Structure to keep the familiar shape, the part that's easy — type, object, properties, a map of names to types — because everybody in software development knows that. The easy traps come with the conditional composition constructs, any-of, all-of, one-of, which let you very quickly build something that's no longer representable in a database or a programming language because there are no guardrails. So I built the tooling and the spec together, and the fact that the assistants do well with the spec is itself the evidence that writing it clearly pays off. The SDK existing in two weeks instead of half a year is the proof that a good, clear spec is what makes the whole tooling story possible.

Question 11

How do you manage schema at scale?

Accepted Answer

In the core spec you are just as constrained as you are with Avro — there are no external references, so you can rely on the fact that a core schema is entirely self-contained, which is a quality unto itself. When you do need to reach across boundaries — your team's schema, common schemas from your domain, industry standards for banking and so on — there's the import extension. You say you're importing this file, it resides over here, and you map it into a namespace inside your document; then all the references you make point into that mapped document. It's a two-step process, not "resolve a file and point straight into some particle of it." That makes referencing much easier because the ambiguity of what path a particular type needs to live at, depending on where you reference it from, completely goes away. And then xRegistry, the work we're doing in the CNCF, is how you manage versions of these schemas in a registry in a standardized way, either as documents or as APIs. It all ties together — Avrotize, the SDKs, the spec, and the registry.

Question 12

What does policy management look like across schema?

Accepted Answer

For streaming data there's an interesting challenge: how do you maintain the speed of a pipeline while making sure no data that shouldn't be there gets through? You need to move filtering into the pipeline, with tooling that can look at what might pass and govern it — one subscriber may get the data, another may not. The way you do that is you write schemas and annotate them, hanging rule sets off types, and that requires a schema language that allows those extensions. That's in the back of my mind — it's not product reality yet, but it's certainly possible with the foundation JSON Structure lays, because it allows arbitrary extensions at any point where you could anchor those rule sets. We already have tooling at Microsoft like Purview for classifying data in databases, and Apache Atlas does similar metadata work. The deeper point is that most Kafka systems work today only because the publisher and consumer are the same people, or at least in the same room. Once they're not — and the other side may not even be human — you need a neutral ground to communicate the data needs through metadata, with strong tooling that can pick it up.

Learnings

Guidance

Toolbox

Alignment

API Evangelist LLC

API Evangelist Conversation with Clemens Vasters on JSON Structure and Bringing Sanity to Schema, Metadata, and AI

Conversation

Who are you and what do you do?

What has your journey at Microsoft looked like?

What is JSON Structure?

Do you see JSON Structure bringing some sanity to AI?

What is the role of metadata?

How can you extend JSON Structure?

How did you approach the dollar-ref complexity?

How can JSON Structure be used by other specifications?

What is the tooling ecosystem for JSON Structure?

What is the relationship between the tooling and the spec?

How do you manage schema at scale?

What does policy management look like across schema?

Clemens Vasters

Details

Listen

Have a conversation

Newsletter

Discover APIs