Some months ago, my 6 year old son discovered Minecraft. His sister and I soon followed1. It’s a great game and there’s a ton of good content out there about it. Reddit and forums are active. it has an excellent wiki, and lots of YouTube stuff. Not surprising given how popular the game is.

One topic that is covered a bunch is how to best mine for diamonds. The thing is that beyond a graph of diamonds by layer, I don’t really see any data involved.

In this post, I talk about how I parsed a Minecraft Bedrock world database to get some more numbers. Topics include:

  • Pictures of some fun situations and overall distributions.
  • Some numbers concerning the densities/distributions diamond, other uncommon ores and spawners.
  • How to parse the data yourself.
    • Mojang/Microsoft has released their variant of Google’s leveldb, but it doesn’t compile out of the box and it doesn’t come with a reference parser.
    • The wiki is very helpful but I find that actual code samples tell me more.

A big caveat…

This post mentions data from only one Bedrock world which hasn’t been traveled much. The seed is “-1337710146”. I don’t remember where I got it but it was from a YouTube video. Basically, I loaded it in creative and flew around a bunch to trigger terrain creation.

In the end, my data represents:

  • 3705 16*16*256 block chunks
    • so an area of 3705*16*16 or about 1000×1000
  • 87,949,312 blocks are actually stored. This is less than the 242M blocks in the world. There’s no need to store subchunks of the sky.
  • 12324 diamonds or about 3.3 per chunk.

My main point is to enable others to parse their data

My main intent in this post to to share the code I used to compute these numbers. I’d be really surprised if I don’t have a big bug in there. Don’t trust my numbers, play with the code yourself.

At the same time, I realize that most minecraft players are not computer programmers. Hopefully, the stats I give are interesting to you.

See the second half of this post for instructions on how to parse minecraft bedrock data.

My code is available on github.

For non-programmers:

Some fun pictures

In my explorations playing the game, I hadn’t come across diamonds in lava. I’d known to watch out for diamonds over lava, but not swimming in it.

In this picture, there’s a diamond behind the cobblestone block, directly below flowing lava. There’s also one to the right of it.

This is the funnest one. My program gave me a list of diamonds under lava. I went to the location and found this… where’s the diamond?

Douse it with water.

Pick out the obsidian and there it is. You’d have to be incredibly lucky to find this one.

There’s only one diamond, so that luck wouldn’t have taken you very far.

I was also a bit surprised to find some spawners in lava.

Interesting stats

Given 87,949,312 stored blocks, here’s a high level breakdown:

[table id=2 /]


Diamonds, and other ores, tend to be found in clusters. If you find a diamond, you should dig around a bit as there are usually a couple more nearby. I define a cluster as a set of ores in which every member is 2 blocks or less away from another member.

In the table below, for each size of cluster, the number of such clusters is given for each ore type. In the case of diamonds, if you find one you’ll usually find between 4 and 6.

[table id=4 /]


We’re told to mine at y=12. Here is a table that shows where the ores are in my particular world. The heading contains the total number of each ore. If you’re looking for diamonds, you can basically look anywhere between 5 and 12.

[table id=5 /]

Caves vs mining

In various forums, there’s the question of how many diamonds will you find just by exploring caves. To find an ore in caves, it needs to be next to 2 an air block. According to my data, out of 2661 diamond clusters, 77 of them had a member next to an air block. This represents 311 diamond ores or a little over 10%.

Mining strategy

Online, there’s a lot of talk about the best branch mining strategy. Tunneling every 4 blocks guarantees seeing everything on your level and one above. But… it doesn’t take into account that if you find a desired ore, you’ll continue digging the area. Clustering.

So, I added some code to see how many clusters you’ll find with each branch mining spacing. If you find one member of a cluster, you’ll find the whole cluster.

So, the table below is kind busy.

  • The rows represent the y/altitude level of your feet.
  • The columns represent the number of blocks between each branch. So if you have zero blocks between each branch, you’re mining two complete layers. One block means, branch/block/branch/block…
  • The heading tells you the number of blocks between branches and what percentage of the two layers you’re mining.
  • The table cells have two numbers.
    • The percentage of all diamonds you’d find for that level and branch spacing. Since Diamonds are distributed over 10+ levels, you wouldn’t get them all.
    • On average, how many blocks you’d have to mine per diamond found. The numbers seemed low to me after first. 125 blocks per diamond was not my experience. Remember however that diamonds are usually next to friends. If you think of 4/cluster, that means you’d have to dig 500 blocks to find a a cluster.

Sorry that the heading font is so big. I couldn’t find a way to get TablePress to make it smaller.

In the end, it doesn’t really seem to matter much how many blocks you skip, though it does seem that digging a little deeper is helpful.

When computing these numbers, I consider that you’ll find diamonds if:

  • A cluster member is in the branch you’re mining.
  • directly above or below
  • directly left or right.

[table id=6 /]

Parsing the code yourself/the code

There are three parts to the code you may care about3:

  • Getting Mojang’s leveldb code to work
  • Parsing the actual structures
  • computing statistics

Getting Mojang’s leveldb code to work

Mojang was “kind enough” to release the leveldb code. Super helpful except it doesn’t compile for me to use it out of the box.

Why do I put quotes around “kind enough”? Because I want to go on what I think is an interesting tangent.

A tangent on open source software

I used to work at Intel Corp as a computer programmer. Intel’s design team uses A LOT of Perl code. Until 5 years ago, when I left, it was mostly Perl 5.8.5 but that was beginning to change. I believe the version was 8 years old at the time. It was mostly a matter of libraries/packages compatibility.

Anyway, I really liked Perl and I wrote a lot of it. Over time, I cooked up some packages that I thought others in the Perl community would find useful. So I contacted the folks in my division who deal with open source stuff. After one conversation with the guy it became clear to me, “Intel really doesn’t want to release anything into open source”

Intel is dependent on open source software. Design has always been done on various flavors of unix/linux. Redhat and SUSE are two that I remember.  Lots of other related software. Boost, GNU, the list is long. Still, Intel really doesn’t want to release anything.

Do you feel outraged? Don’t be. The reasons are not Intel’s fault.

Intel goes through great lengths to pay for free open source software. Intel searches for someone to pay for everthing they use. GNU… they pay someone (FSF?, Redhat/SUSE?) They pay Nokia to use QT. If you want to use a piece of OSS, you find need to find someone who will take money for “support”, but it’s not about support, it’s about paying for what Intel uses.4

But why? Intel is a large company with lots of money. If they were taken to court, it’d be easy to convince a jury that big bad Intel is taking advantage of people. So they insist on paying for everything.

During my 22 years there, I heard the following several times, “if you ever end up on the stand in court, there will be a lawyer who is much much smarter than you who will make you look like an evil idiot”

So why couldn’t I release my Perl module on github or wherever?

How do I know my module is 100% my code? How do I know I didn’t copy some code from Stackoverflow. By releasing my module, I’d (as someone working at Intel) be open sourcing someone else’s code. That someone did not give Intel permission to do that.

When I was discouraged from opening my code, it wasn’t about being overprotective about Intel’s IP. There’s very little that I know that would be of any interest to NVidia, AMD/ATI, TSMC, Apple… To get what I know, all they’d have to do is hire me. I know lots of people that have switched to those companies.

It wasn’t about protecting Intel’s IP, it was about protecting the IP of others, or more accurately, not stepping on the rights of others ourselves. Don’t want to end up on the stand against the smart lawyer.

Thankfully, since leveldb was already open sourced by Google, Mojang had no choice but to release their own changes.

As far as I can tell, the only changes they made was to add some additional compression to the format. If they hadn’t done that, I imagine we wouldn’t have gotten a leveldb from them at all. We’d have no idea which version of the code applies.

I do believe that Mojang wants to be open. It’s core to the philosophy of Minecraft.

End tangent

Getting leveldb-mcpe to compile/work

Although it burned a bunch of time figuring it out, the answer is easy. Two things:

  • The existing code includes <snappy/snappy.h>. On my ArchLinux system, I need to include <snappy.h> instead.
  • There’s a file which wants the function port::Snappy_Compress, but it’s missing. Just comment that line out. It’s only benchmarking stuff and doesn’t matter if you’re just trying to parse a minecraft world.

Stupid easy, but my experience as a programmer didn’t let me believe it at first. I searched for old versions and figured that if one thing’s missing, what’s next.

Parsing the actual data

Big/little endian

On the minecraft wiki, several mentions are made towards both big and little endian, but all of the data I came across is the same format. I use this function to extract all 4 byte numbers:

int32_t get_intval(leveldb::Slice slice, uint32_t offset)
     int32_t retval = 0;
     for(int i=0; i<4; i++) {         // if I don't do the static cast, the top bit will be sign extended.         retval |= (static_cast(slice[offset+i])<<i*8);
     return retval;
NBT Data

All NBT elements have a name. That name is often a zero length string, but there’s a name. No big deal.

NBT strings also have a name. So basically two strings combined into one.

Parsing the rest

I won’t write anything about that. it’s either covered on the Minecraft wiki or easy to get out my code. The relevant file is this one. Note that I only pay attention to the block code. Mobs and players are ignored in the current version.

It’s not a particularly large file.

Clustering ores

To cluster the ores, I used an algorithm that’s probably overkill. I compute a Euclidean Minimum Spanning Tree. The algorithm for MST is one that CS majors learn but that algorithm requires a graph. The easy answer is a complete graph but if you have 1000 nodes, that’s a million edges. Instead you run on a triangulation. Specifically the Delaunay triangulation. CGAL and Boost libraries to the rescue.

I guess that’s it. It feels kind of abrupt, but I can’t think of additional relevant details. Comments and questions are welcome.

Next Steps

Minecraft behavior is often a mystery. One thought I have is creating a module that could occasionally run on a server that scrapes interesting data about your world and maybe presents it on a website. One example that comes to mind is village data. Where are your village centers? Perhaps you accidentally merged multiple villages. Is it cheating to find out?

Would this be interesting?

Please like, share, comment and subscribe. And don’t forget to click the notification bell.

Oh wait, this is a blog. I’d love to get comments though.

Parsing and analyzing Minecraft ore distributions

  1. I actually bought it for my daughter on the Google playstore probably five years ago, but she wasn’t interested. Now, she’s as hooked as the rest of us.

  2. “next to” does not include diagonal

  3. There are three kinds of people in the world. Those that can do math, and those that can’t

  4. Sadly, they really want to pay a company. We once tried to get Intel to pay a gdb developer to fix some bugs that were biting us. No luck.

Leave a Reply

Your email address will not be published. Required fields are marked *