Author: MrDienns

Viewing only articles authored by MrDienns.

Owner & Lead Developer of Dyescape
About me:
I'm the owner and lead developer of Dyescape. I'm a 19 year old software engineer, knowing various programming languages such as C#, PHP, Java, JavaScript, jQuery and the basic website languages such as HTML and CSS. I'm extremely interested in cyber crime and cyber security. When I got the spare time, I try to learn some stuff about cyber security so I can hopefully one day become a certified ethical hacker.

All plugins on Dyescape are coded by me, and I'm also responsible for setting up and maintaning all of the servers. I have big plans when it comes to Dyescape's software and servers.

About Dyescape:
I've been playing Minecraft since the 1.0.0 release. Since roughly version 1.2.5 I got into server management and running one. I've been in the business ever since and I love doing it. Some time ago, I created a small survival network called Dyescape. It didn't work out great as player count never really got high or stable at all. The staff team and I decided we shut everything down and made something truely unique of Dyescape, which is what we all know currently.
Dear community, as some of you may have noticed, we've suffered from several infrastructure outages over the recent weeks. This caused our website, Minecraft servers and internal tooling to be unavailable. We wish to be transparent with everyone on what is going on, why these outages happened and how we plan to tackle them moving forward. The post below will involve some technical details. You can skip to the summary at the bottom for a short, simplified version.

Storage
Within the hosting, and especially the cloud hosting industry, storage comes in many different shapes, sizes, pros and cons. Some setups value simplicity, some value scalability, others value performance or integrity. When setting up our infrastructure, we had to choose between these options and decided to go with a storage solution that had integrity and was scalable. As such, we are running an internal block storage solution called Longhorn, with volume replication across all of our servers in geographically different data centers. This means that all machines are constantly replicating each other's volumes. This is great for keeping data integer and fault-tolerant, as well as being able to quickly move software deployments such as databases to new machines without having to wait for file transfers.

That sounds great, so why bring it up? Everything works great as long as everything remains connected. As soon as these nodes lose connection with each other, the machine entirely will be marked as unhealthy. This is okay because then we can deploy software on different machines and quickly recover. The problem, however, is that the node becoming unhealthy, results in the storage solution shutting itself down on this machine. This means the replica status is lost and no health can be communicated anymore. As a result, the data replica on the unhealthy node is immediately considered degraded. This doesn't mean data loss, nor does it necessarily cause any damage, but it does cause the storage solution to rebuild the entire replica on that particular node from a node that was healthy. We have got data replication across our American and European servers as some of you may know. Repairing a volume across the Atlantic ocean comes with a big drop in network capacity. To save budget and reserve extra capacity for alpha and beta development, excessive hardware has been removed and thus we cannot have replicas for these volumes be reliably placed across multiple machines in the same continent, and we're forced to replicate across continents. A full replication rebuild thus causes time. Doing this with dozens of volumes puts considerable strain on our connection between the two continents. In some cases, volumes cannot be attached to workloads before they are fully healthy. This is what was the cause of lengthy outages the past few weeks. Why machines lost connection in the first place will be disclosed later in this post.

The above wouldn't be a concern if a short connection drop wouldn't immediately kill the replica instance on the other machine that loses connection. The developers of said storage solution agree, and a bug/feature ticket has been created on their Github as a result. We were aware of this issue, and have been tracking it for some time. Unfortunately, the priority wasn't high enough initially and a resolution has been pushed back to the next major update. Since the outage last weekend, we have tweaked several settings in this storage solution and are monitoring the results. We have an alternative storage plan drawn out in case this solution remains problematic.

Nodes losing connection
As discussed above, issues occur when nodes lose connection. This by itself has several causes, and we'll be sharing three of those causes here.

First, a worldwide OVH outage on the 13th of October caused a connection loss between all of our servers. Exact details are missing, but the general understanding is that OVH failed a large scale BGP routine update, causing all routes to disappear. The same happened with the worldwide Facebook outage not long before that. In a case like this, we are powerless, and with outages like this, the Minecraft industry as a whole takes a beating. The outage was solved by OVH within a reasonable time, but it caused our storage solution to panic and all volume replications had to be rebuilt as a result.

Secondly, we only got an understanding of the magnitude of the bug (or at least, limitation) of our storage solution somewhat recently. We've performed several rolling updates on our orchestration platform in production to keep everything up to date. This can normally be done without any downtime, but because of the storage limitation, does cause downtime. We weren't aware of this until it happened a few times.

Thirdly, the most recent one was a problem with our CNI (Container Network Interface) plugin. We are still unsure of the exact cause, but a sudden crash occurred on the plugin on one of our machines. This caused a connectivity loss between processes on that particular machine. Rebooting & reinstalling part of the systems did not help. We could not trace the cause but eventually tackled the issue by updating the kernel and operating system on the machine. We suspect that an unknown, undiscovered bug occurred as a result of an edge cause of our specific kernel, operating system and CNI plugin versions.

Summary
In short, a combination of sudden connectivity drops and a current limitation of an internal storage solution caused several unexpected, lengthy outages. Moving forward, we are adjusting settings appropriately to get the desired behavior of our systems and are continuously monitoring the results. An alternative plan for storage is ready in case this setup remains problematic.

We sincerely apologize for the outages. We are still actively learning from all of the feedback we've gotten since launch; on a functional and technical level. Thank you for your continued understanding and patience.
After a tremendously long development period, Dyescape has finally been able to put out a first release; v0.1.0. Those who were active in the Discord or the Twitch livestream yesterday will have already seen that we ran into a few technical struggles. This thread serves as transparency towards the community to explain what happened, why it happened, how it could happen and what has been done to address it. On a positive note, we'll also list a few technical things that did go well.

Database cluster
In a project like this, being dynamic with content changes is crucial. Software and content are two completely separate complexities, and in order to ensure that it doesn't become a single, even larger complexity, we've made everything configurable. From items, to skills, creatures, quests, random encounters; everything is configurable. In order to facilitate for this, a JSON document based file storage system was set up. For the past years during development, this has been done through flatfile & SFTP to allow the content to quickly make changes, test these and commit them to version control. A few months ago, a MongoDB cluster was set up to make it production ready.

Having set up the database cluster, everything was operational. We could import our content, we could play the game, save characters, swap to a different server, and everything would be there. However, before launch, we still had a few important content & software changes to process. While doing so, something in the database got upset, causing cluster sharding initialisation to fail and the cluster to become unusable. This is the first major technical downside we ran into, which is what caused the initial release to be two hours late. Thankfully, the team quickly jumped to support and we managed to solve the issue.

GEO load balancing
GEO load balancing is a technical setup that automatically routes users to the nearest server. This is setup by our Anti-DDoS & Load Balancing provider. From what we could see, this load balancing worked for most users. North American users see on average roughly 40 to 60ms ping from earlier playtests, and west European users get a ping of around 15 to 25ms. This GEO load balancing is setup for two main reasons; to give users the best possible connection and reduce lag, and to limit cross-continent bandwidth usage.

However, due to a misconfiguration on the proxy service discovery side, players were often sent to a fallback server in a different region. They would connected to a North American proxy, but a European fallback server for instance. In some unlucky cases, some people didn't get the expected GEO load balancing on the proxy to begin with. This caused for example American users to connect to a European proxy, and then to an American fallback server; doubling the already bad latency as a result. However, this was only for a handfull of people and was likely caused by some fallback servers crashing (more on this later).

In order to fix a few latency issues some players were having, two new domains have been created. These domains are eu.dyescape.com and na.dyescape.com created. In order to keep things as stable as possible for the time being, the proxy & fallback setup has fully been split into two separate networks now, so there's no chance of ever being routed to a fallback server of the wrong region. The play.dyescape.com domain should still provide accurate GEO load balancing. If not, please contact the team.

Cross-continent bandwidth
We have explicitly set up the infrastructure to have considerably strong network bandwidth. At our hosting provider, a large private network consisting of 4 solid dedicated servers with 8 physical cores, 64GB memory each, and a 2gbit connection. Despite initial thought yesterday, we currently see no signs of bandwidth coming short. The timeout errors we were seeing yesterday was actually caused by an issue on the software. The resulting issues for players however made us think it was a network related issue.

Crashing fallback servers & timeouts
While the Dukes were playing, we were notified of numerous timeout & crashing issues. After some investigation, it seemed to be caused by a recent software change in our interactive chat plugin. A code issue caused an infinite loop, exhausting the CPU capacity and killing the instance. This issue was fixed at around 02:15 AM our time. Afterwards, the Dukes could continue playing.

Remaining issues & alpha queue
The question which we see posted in Discord extremely often since the release, is why the queue is not processing. There's a good reason for this; although we've fixed any infrastructure & fatal software related issue we see, there's still a few in-game issues causing quests to become stuck, blocking progression. These issues are scheduled for a v0.1.1 patch release, which is planned to go live in the upcoming days. The team is working round the clock to get this patch out.

Because these remaining issues prevent gameplay progression, we've decided not to progress the queue until the v0.1.1 goes live. We deemed the best decision would be to have only Dukes test the game. It's a small group of people that can help us efficiently identify critical issues.

Positive notes
Down to some positive comments, because despite the technical issues, there are also multiple compliments worth mentioning. I'll go over some of these below;

Gameplay feedback (from Dukes); while currently only Dukes are able to play, we've received very positive feedback from them. The game is smooth, skill usage is a bliss, content is interesting & understandable, and there's multiple qualify of life features that are very well appreciated. After having fixed the regional connections & freezes, ping seems to be good, and the average milliseconds per tick seems to be healthy. Once the in-game issues are fixed in the v0.1.1 launch, we can likely start processing the queue.

[​IMG]

Conclusion & thank you
We want to close this off with a massive thank you. Albeit rough around the edge; Dyescape has launched. It has taken us well over 4 and a half years to get to this point. It has been an incredible ride to reach this state. We've had our ups, we've had our downs. We've seen team members come and go, we've seen content be revamped, software be overhauled and we've seen a community grow.

Despite the messy infrastructure launch, in all of our years of playing Minecraft, we have not seen a more considerate, friendly & heartwarming community. The support is incredible. We will continue to work hard getting the fixes out and to have everyone be able to join. Hope to see everyone on the server in v0.1.1!
Dear community,

It's been a while since we had a positive announcement (other than the glorious mousepad giveaway), but today is a great day for us. No no no, hold your horses, Alpha is not here yet. I'm sorry to crush your hopes and dreams like this already. However, I'm very pleased to announce that we're getting some massive help on the development side. We are expanding the backend team (the nerd team) with not 1, not 2, but 3 new people! Please give a heart warming welcome to @MiniDigger, @MisterErwin and @Michael!

@MiniDigger will be helping me out create more Minecraft plugins. His first job will be something that you're all gonna love; guilds! It's not an easy plugin, but if he's joining the team it means that we can deliver this feature a lot sooner to you guys than originally expected. He already fixed 1 bug in less than 2 hours after I gave him first access, so I have good hopes there.

@MisterErwin will be helping us out with creating some more advanced software unrelated to Minecraft. His first job will massively help us straight away when we go live in Alpha. He's going to work on an application that allows us to massively keep statistics of everything that happens in our game. We can use it greatly for balancing purposes as it will give is a very clear eye on what exactly is happening, even if we have hundreds or thousands of players. It is also confirmed that he likes pine apple pizza.

@Michael is a system administrator who's going to help me build a new development environment as well as production environment to host our game on. As some of you can probably imagine, Dyescape is a fairly complex project, and our network is no exception, hence why having a dedicated person for this special job could massively help out.

Last but not least, @Astantos is currently taking a few steps back from being content manager and join as a regular content team member. Due to some personal reasons & the overall time he says he's able to invest in the project, we all came to the conclusion that it would be better for someone else to take over for the time being. @Aekalix will be using his iron fist to take the content team so the next level and structure the team in the most efficient way so we can deliver our project as soon as possible.

I'd also like to massively thank everyone who's here in the Discord. The community is still growing every day and we really appreciate the support we are getting lately, we are getting close!

Thanks to each and everyone of you for being with us!
People who are active in our Discord server have already heard the news a few days ago, but we have not posted it here on our website yet. As you might have seen: the alpha has not launched yet. This is simply because we are still behind on schedule in terms of content. Backend (plugins) is roughly on schedule while frontend (buildings & terrain) are even ahead of schedule. The pre alpha has been delayed several times now and it won't be the last one. We apologize for this, but it's reality, and we can't make it any better. We are trying our hardest to get the pre alpha open as soon as possible, but we cannot launch an incomplete game. Our apologies for this.

Note that applications for the content team are still open. Please send some of your work to [email protected] as we are always looking to expand the team with high quality people.

When is the new deadline for the alpha? For the sake of letting this project run smoothly; there isn't one for a while. The problem of Dyescape development at the moment is that some parts of the server (mainly content) are still extremely hard to schedule. This causes us to keep setting deadlines that we hope to reach, but realistically speaking simply can't. This is what causes the delays. Delays are not caused by issues during development, or development simply stopping for days. In fact, development is not slowing down and it's only speeding up more and more. We are working faster and more efficient every day, just scheduling is our problem at the moment.

No the alpha will not open any soon, but at least the good news is that development is not struck by it. It's just because of the fact that we have an extremely hard time scheduling some parts of the server. Hence from now on the alpha is delayed until further notice. We will not set any deadline anytime soon so the team can simply work on quality results without having to stress about unrealistic deadlines. We encourage everyone interested in the project to join our Discord server as we are very active on it. We post small development updates almost daily about what we have done, what we are focusing on and what we have achieved.

These delays also have some good news though. The alpha is delayed due to content, this however does not mean the other parts of the server are not on schedule. Due to the frontend (buildings & terrain) being ahead of schedule, we can actually release new parts of the map faster after we launch alpha. The backend (plugin) team now also has more time to start developing plugins that were originally out of scope for initial launch, such as dungeons & bossrooms.

These news messages don't show Dyescape's true bright side, nor do they show what we are capable of. Please do not judge our project based on the information you see on this website. Join our Discord server, interact with the community, see our development log channel and talk to the actual team behind it.
As some of you might have already seen in our Discord server, we ran into a few critical issues in terms of plugins and configuration last weekend, causing somewhat delay again on development. On top of this, we are not yet happy with the amount of content we have. We deeply apologize, but we will have to delay the pre-alpha launch by another 2 weeks, putting the new deadline at March 16th.

We do not like the delays either; in fact, we hate delaying it, but we want to live up to the expectations of people and assure a certain level of quality that each and everyone of you deserves. Dyescape is becoming the worlds largest MC MMORPG ever created from the looks of it (in terms of map size, amount of planned content and plugin complexity), and we've only been in development for a good year. Compared to other, similar servers, that is blazingly fast and I am proud to have a dedicated team working hard on this project every day. In our Discord server, we have a highly active #devlog channel where we (several times a day) post updates about what work has been done. We highly suggest anyone interested in the project to hop on the server and communicate with us through Discord as well.

Please do not see the delays as a bad thing. Delays happen because we are either not happy with the quality, or the fact that we ran into severe issues, critical bugs or game mechanic exploits. We are working hard in getting these fixed as soon as possible, but there is no point in going live with issues like this, hence why we are delaying things. Patience is key, rushing everything and hacking things together will not help anyone. Please understand that we are merely doing this to live up to our quality standards.

Thank you
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.