Common Stability Misconceptions

Stability is a key factor for any grid or region and has been a point of contention around operators for a long time now. With software never being truly free of bugs, especially not the more complex it becomes stability can be overlooked. Thankfully the current aim toward updating some of the basis and utilizing more current technology has pushed a renewed sense for stability, compatibility and performance. That said, let’s actually take a look at the common stability misconceptions, data to back them up and disprove them.

Traffic equals instability

The idea that constant traffic causes eventual collection of more and more garbage that is not properly cleared has been around for a long time. For the most part the core concept itself is true, some accumulation of abandoned data can never be fully avoided, but clearing out and freeing up resources has massively improved with the change to a more recent version of the .NET framework(see picture). This major shift was accompanied by various fixes towards the consumption and, more importantly, the re-allocation of used resources. The return of which granted an overall reduction in resource usage that often sits in the 30% range. Such a change is definitely noticeable and even measurable. The bigger the resources usage was before these changes the bigger the gains are. Some of our customers are seeing reductions beyond 50% simply from upgrading to more recent versions. That said, there is still some work to be done and given the nature of the framework some resource leakage, as it is commonly called, still occurs. Levels are much lower compared to a few years ago, but edge-cases still exist and we still recommend refreshing areas with heavy traffic often, especially when the clientele has a tendency to, let’s put it mildly, act less gracefully in their self-accessorization(is that even a word?).

Error means crash

Humans are quire capable of handling errors in their own “programming” or execution of tasks, most programs however tend to struggle with that and need a helping hand from their programmers to make sure they don’t Windows98 on their users. Handling is often done by quite literally attempting to execute a task and waiting for a return. Should the return not occur the program simply continues throwing whatever broke right into the users face, essentially for them to fix, should they know how. The other common option is the typical “has stopped working” you are all used to, subsequent sending of bug reports, which are actually read believe it or not, or the “write to log and die” method of simply closing with the user being none the wiser as to what happened. The more complex a program becomes the more problematic it can be to employ those latter methods so proper error handling is vital. When so called “hard crashes” are reported it can generally be attributed to misuse these days. The time when a simple error in an item or script caused irrecoverable shutdowns of vital functions are almost gone and usually can be easily traced back to something that is easily fixed or, well, completely out of user control. Either way the crashes are becoming less and less frequent with each new piece that is reviewed and brought up to current programming specifications.

Restart every…

An actual configuration option that can be set, yet probably is a lot easier accomplished through external program control systems. Nonetheless refreshing, as we like to call it, does help to maintain a “refreshed” state that is not impacted by long runtimes. Then again, as you can see from this picture

runtimes that exceed day, weeks and months are no rarity anymore. Even areas with greater resource usage can share the same timeframes for uptime. We are all familiar with the method of fixing a program by simply restarting it, to reset everything to the start and clear any potential wrong data out, but that expects whatever is wrong to be able to reset itself and not reload the bad data. As such making sure such bad data cannot even enter, be that at runtime or after restarting, is a key part of creating long-term stability. The common method to achieving this is defining the types of data carefully and thus not giving the chance for incorrect or corrupt data to maintain in said types. Maintaining good type definitions throughout a program can be tough and sometimes you just want a catch-all for some random piece of data you don’t want to strictly define, but therein lies the issue that can lead to data corruption you cannot detect unless you wait until accumulation or constant overwriting causes an issue. As such doing long-term tests, that tax how long a program can clean up after itself, are important to verify the true stability of a program and its ability to maintain a clean set of working information.

I still crash though

And that cannot be fully avoided as mentioned above. Often the reason for this kind of comes close to the famous “an unfortunate turn of events” that come together to bring about a state that cannot be handled, because it was not even expected in the first place. With a client-server type application the transport of data between them being subject to various variables of doom the results can be unexpected. Even so, recovery from such states are possible and often not as far away. A good example is the main difference between the two common transport protocols used: UDP vs. TCP, you can look up yourself what those two like about each other and what they hate. The basic concept is that unlike UDP, TCP actually checks whether or not data has reached the other side. You can imagine this return may make some things slower when the difference of a tiny piece of information reaching the other side may not make a difference, so UDP is used to send data that can account for small missing pieces or simply to “fire and forget” because you, the user, are not going to notice the loss. However, in the implementations lies the caveats that can break the camels back. UDP in itself is just a sender and receiver of data, what you do with the data is up to you. You could, if you wanted to, send data and request the verification for it manually, all while still using UDP; that’s what, at least in some form, is used to send some of the most requested types of data we encounter. The system is more solid than it sounds, but can still sometimes miss the mark. Normally this is not much of a concern as it is simply resent, but during that period other tasks may be held and eventually you run into being disconnected due to timeout. This issue can be compounded by the amount of data that needs to be sent for certain things and so the more you use it the worse it can get. Thankfully these days methods exist to cache data, to resent only partial amounts when the wholly doesn’t reach the end and so stability can be increased. It doesn’t solve the elephants riddle, but it makes sure it can’t eat all the peanuts.

Who’s to blame?

It has been thrown to everyone under the sun, from users to developers to operators and your uncle Joe, but in the end it’s a community effort. Specifically in terms of educating toward better usage of resources, adequate methods of use and restraint. Nothing’s perfect, but if handled carefully and with a bit of understanding most of it will work just fine. It really is not the fault of people specifically, but what we all do to spread information about things to avoid, things to practice and simple guidelines to follow that ensure a good experience for all, including the poor programming code underneath it all. With enough effort and care the overall experience can be massively positive, but that requires everyone to work together and realize the limitations of software, hardware and people alike. The future will be a bright one, if that is on everyones’ mind.

New Hardware, New Systems

In our continuous pursuit to enhance our network and customer service we have recently begun rethinking our backup strategy. With the advent of new requirements following 2020 we will be required to store up to 12 months (as opposed to 4 months) of the data our customers generate on our network. This new requirement, set forth by German lawmakers, means we require much more storage space. To meet the new storage requirements we have changed our backup strategy and ordered new hardware to handle the additional data influx.

Along with the new hardware we have rolled out a new ingress system. This new system will handle customer requests for deployment of their own data into our network. With mesh becoming ever more present in our worlds we can see the increase in size of customer data. As such the new ingress system supports file sizes big enough to handle all this data.

But wait, there is more. With the aforementioned increases in customer data comes a problem for making all this data available to our customers. To solve this problem we have also deployed a new transfer system that will securely handle data requests.

With these two new systems we will be able to support the projected increase in data sizes and provide our customers with the peace of mind that up to one year of their data is securely stored and accessible in case of catastrophic failure. Further will it provide us with an interesting metric on how data is evolving and what needs to be done to future-proof our systems to handle these new loads in the future.

Developer Correspondence

Every now and then voices are heard within the OpenSimulator community actively targeting the developers being the core version of OpenSimulator, specifically with the complaint that correspondence is being ignored and user requests are not met with an appropriate response. This sentiment has lead to many not actively engaging in the channels of communication provided by the project as well as to form committees and organizations designed to “ease the connection between uses and developers” often with the exact opposite effect. While this is a valid concern for any project and communication between users and developers on any project is always strained by many factors; the core OpenSimulator developers have been doing respectively well in their efforts in communication.

This is specifically highlighted each time Zetamex has proposed changes of various degrees to the project. As you may know, Zetamex maintains a fork of OpenSimulator that leans toward extending the functionality and customizability of OpenSimulator to suit more peoples needs. Historically these changes were kept exclusive, but recently we have made a few of them available for the core project as well as engaged in discussion towards bugfixes and new features as well. This communication, while not always fruitful, has so far been fairly positive and discussion was civil, solution orientated and has produced some welcome changes and improvements.

The dialog first began over two years ago with the apparent problem of interruptions in movement of avatars on regions. These spikes and delays caused by some complex physics calculation and prioritizing were solved within a few days after testing and continuous dialog helped pinpoint the issue. What followed were changes to scripting functions like llDetectedGrab to better reflect the expected behavior and solve a rotation gimbal lock. To just yesterday, at the time of this post, solving a constant not being obeyed by an OSSL function. From the time of reporting the problem and discussing potential solutions to a fix was mere hours. Today the enhancement was properly signed off and confirmed working, further extending the capabilities of OpenSimulator.

These repeating cases of a problem being addressed and solved in a timely fashion and with great regard toward least inconvenience to users, along with careful consideration as to the extend new functionality should go with regard to performance; this, to us, shows a real commitment toward the future of OpenSimulator and displays the great passion the developers, specifically one Antonio Jose Almeida Leal Duarte aka Ubit Umarov. We believe the project has a true future with the passion that has been recently displayed toward improving and expanding OpenSimulator.

The same, however, can not be said about the other side of the coin; The Viewers. Historically changes to OpenSimulator requiring changes to the Viewers have not been met with a the same kind of passion or even understanding. Usually the correspondence is very onesided toward not implementing any features simply to support OpenSimulator. That is despite some viewers explicitly aiming toward “supporting” OpenSimulator and its development. A good example were the changes required to the syntax of OSSL after new functions were introduced. For a simple request of updating a single file to contain the new definitions based on recent changes the response was a complaint toward the rapidly changing nature of OpenSimulator at the time. Other requests to adjust viewers to better reflect capabilities within OpenSimulator were met with bewilderment as to why such capabilities were added in the first place. This, among many other instances or rejection and hostility toward OpenSimulator specific changes and issues has the relationship between the core OpenSimulator project and Viewer developers somewhat strained.

For the last few years the outcry toward a true, maintained and non-SL Viewer specifically for OpenSimulator has been widely discussed, but no instance within the community has committed itself to this. Attempts have been made numerous times, but non have so far produced an equivalent project carrying with it the guarantees and structure required to truly represent OpenSimulator as an official Viewer or associated project. Various capabilities that would be within reach of OpenSimulator are kept back by this fact and so improvements can only be achieved in the realms of bugfixing, performance and stability. New and exclusive features requiring handling on the side of the Viewers are very unlikely to be implemented.

The Viewers who have historically committed toward OpenSimulator specific versions and offered implementation of features, at the time of writing this, are not looking too healthy and an official association is not sought after by the core OpenSimulator project. Even the testing of most bugfixes and performance enhancements are made along existing, SL-first, Viewers. We dearly hope this situation improves at some point as we can see a great deal of potential in changing some of the core capabilities of OpenSimulator to better reflect current technology and code standards along with simply expanding their functionality into new realms.

We will maintain the correspondence we have with the core OpenSimulator project and aim to advance the correspondence with the various Viewer projects to attempt to create a better dialog. However it is still unlikely that one of the existing projects gains the required momentum for a true OpenSimulator-first Viewer.

As for any user of OpenSimulator we encourage to browse the links at the bottom of this post in regards to some of the features being requested and implemented into the core version of OpenSimulator. It is not always so easy to create a dialog between users and developers and this has caused and will cause friction. However, the many organizations that have so far formed to easy this dialog all have so far not managed to close this divide. More so the developers have expressed a clear distaste for the attitude and entitlement these groups have shown in the past. It is not likely the divide will ever be closed, but it can be reduced somewhat through empathy and an understanding that many things sounds easier than they are to implement. Those with ideas should not keep them to themselves, but we encourage anyone with an idea to first speak to others and attempt to gain an insight into the processes and workings of OpenSimulator before attempting to create a feature request or bug report.

Lastly, as this is the last blog post for this year, we wish everyone, users, developers and spambots happy holidays and a happy new year!

Working At Zetamex

Zetamex Network has always been about community and giving back to it. As such we love to give up and coming developers looking to practice and hone their skills an opportunity to get real world experience with working on the projects we develop. Putting the learned skills to the test and working on actual production systems, working with an experienced team and creating things that have a true impact. We have opened a new positions for internships for those looking to do just that; more information here

New year, new goals

We hope you all have had some nice holidays and survived the new-years parties without any major hangovers. We have used the last couple months of 2017 quite productively and moved to a more efficient setup, new people joined the team and new technology was deployed. As with most things the holidays have slowed things down a bit, but we expect things to start picking up again in the coming weeks.

As new-years resolution we have decided to provide our customers and to-be-customers with some insights into running a grid and what responsibilities that includes. If you are interested in opening your own grid or just curious as to what goes on behind the scenes keep reading.

The best comparison that can be made in terms of workload and responsibility, running a grid is like being the mayor of a small town. You, as the grid owner, bear the responsibility of the well-being of your residents along with all the trouble they cause each other, yourself or others. This usually goes further than most imagine however, still, it can be broken down into a few key points that are of importance.

Market

At the moment of writing this there are over 200 grids out there that are openly accepting registrations and sit open on the hypergrid. This means that there are 200 different concepts, ideas and visions for what “the perfect grid” may look like. It is, for lack of a better word, a saturated market. This is likely not going to change either and so the competition for users, content and activity is fierce and often not all that fair either. This often leads to feuds and drama between grid owners, creators and users; everyone trying to claim a piece of the cake for themselves any means necessary. As dark of a picture this may seem to be, there is an equal opposite of genuine sharing and caring with the aim to collaborate and build a better experience for users, no matter what grid they belong to. Though, as with most things, the bad side of this is often more publicized as dirty laundry makes for greater headlines.

Legal

As mentioned previously, the responsibility for everything on a grid lies squarely with the grid owner. This means that writing up proper Terms of Service is a must. Dealing with liabilities, international laws, local laws and claims can quickly consume all your time. Lawsuits can and have happened and especially when it comes to commerce and actual money things can quickly go to court. It is important to have good knowledge over international and local laws pertaining virtual worlds, e-commerce and consumer protections. It also does not hurt to have a good understanding about content policies and copyright. These are areas often overlooked and very few grids actually have proper Terms of Service or End User Licenses covering these areas. Going into further detail or attempting to cover all bases would likely take up hundreds of pages at this point. The best option is to either seek the assistance of a lawyer or, if you like reading, read up on local laws and the laws of the countries you do business in.

OpenSim

OpenSim is, by definition of the project, alpha software. This means that it may not be stable, crash at any time and eat all the work you just did. While we have made some changes to it to increase stability even we cannot guarantee its stability. Making regular backups and understanding the risks is a big part, along with understanding the limitations OpenSim has. If you throw enough at it no hardware will be able to compensate and performance will suffer. Managing load and keeping users from overloading their regions is a big part of running a grid. OpenSim can handle a lot of things if done properly, but one rogue script or item can easily bring everything to a halt. Knowing these things and how to debug and solve these issues is a big part of running a grid and is of vital importance if you want to make the all so important good first impression.

Money

Running a grid, first and foremost, is about handling money. Paying for hardware and service, collecting payments for regions and dealing with commerce. Just paying bills and making sure others pay you, however, is only a small part of it. As soon as money starts changing hands various laws, regulations and requirements come into play. From filing taxes to issuing refunds, all that needs consideration and lots of reading. While it is not directly necessary to register as business when running a grid, when things start picking up and more and more money is involved eventually it will raise some eyebrows and you might find yourself having to pay extra tax or even fines. As with the legal stuff, specific examples depend on so many things that listing them all would take up pages and we simply can’t put this much on the blog.

Humans

It can be easy to forget that each and every user is a human being and as such they have their own free will, as much as that may be annoying at times it can also be the greatest asset. Dealing with that asset is another big effort and requires constant attention. One has to realize that as grid owner, in some ways, you also represent the entire community around OpenSim. Further, since not ever user comes from SecondLife or has a Computer-Science background dealing with questions, concerns and issues in a manner from “Explain like I’m Five” to technical-moon-speak is part of running a grid. The end-user support, as it is called, is a big part of the daily tasks and can make or break a grid. As with most good deeds it is rarely recognized so beyond the work itself the commitment can be difficult to keep up. That is the other human factor in the equation. You, as grid owner, have to keep yourself invested just as much, if not more, as the users on the grid.

Hardware

While having a degree is not required, having a good understanding of hardware, its capabilities and limitations does help quite a bit. OpenSim has very specific requirements in terms of hardware and it can have a massive impact on performance, stability and ultimately the money in your pocket. Beyond the physical machinery you run OpenSim on a big part of the performance comes from the network. Adequate bandwidth and connections are vital, especially if you plan to attract a worldwide audience. We generally advice on those things for our Managed plans, which have us manage OpenSim on hardware that you provide. In all that it is important to keep a watchful eye on scale and scale-ability. When a grid starts growing choosing what upgrades to make and what areas need additional hardware to handle the extra load depend not only on OpenSim itself, but also on its dependencies for Databases, filesystem and more. In all that it is important to keep an eye on what users do and if a performance impasse may not be the hardware’s fault after all.

If all this seems like a lot to consider then that is because it is. As mentioned before, running a grid is like running your own little town or a busy restaurant. It is not something to take lightly and while we generally encourage anyone looking to open a grid, we would not do anyone a good service if we did not educate about these concerns and stipulations. We hope that this has provided at least some insight into what goes into running a grid. There is plenty more that goes into all this and it is in many ways a never-ending story. Dynamic as life itself if you want to be philosophical. In the future we may touch on specific subjects with some actual examples, but for the moment this will have to do.

Adding flavor to OpenSim

Open-source.

The term has become more than what it originally described. Nowadays a whole culture and mindset is attached to it. At its core all it describes is that the source-code of a piece of software is available for anyone to easily view and use themselves, without having to reverse-engineer anything. Open-source and the whole FOSS philosophy is about sharing, contributing and furthering software. Many projects rely on volunteers to make additions, fix bugs or maintain compatibilities, but they are not solely bound by that either. Many projects receive help from companies using the software or even develop it in the first place. These companies usually make profit on selling the support for the software or additional pieces or versions not available in the open-source version.

OpenSim is open-source as well. It has historically been developed by volunteers aiming to create a virtual world platform similar to SecondLife. At the time of its creation it truly had the philosophy of an open-source project. Over time the people working on the project left and new ones came in. With new people came new directions and those directions now somewhat differ from what they were initially. These days OpenSim is being developed by people who have a commercial interest in the software and may want to sell any fixes or additions to it. This, as described above, does not break the nature of the open-source idea or philosophy for that matter. At the same time it does mean that the project has somewhat deviated in its course for making the software better to, well, making money. Again, no issues with that in itself, but it must be said that this does create the possibility of additions or fixes being rejected based on the commercial aspect the software is used for by its development team. This has lead to a feeling of stagnation and rejection from the development team towards the community surround OpenSim.

We have always supported the decisions of the development team and have made efforts to support them financially and with additions to code, however the response we have gotten has been rather negative. We no longer see a future for the development of OpenSim with its current development team. As a result we have decided to create our own flavor of OpenSim in order to make sure development continues under the pretense of making the software better. In technical terms it’s called a fork, a divergence from the original. We will start to implement and change OpenSim based on the needs of our customers and to an extend the feedback from the community as a whole. We will continue to support those who want to better the original OpenSim, but the internal politics and money involved for the current development team has lead to unsatisfactory state for us and our customers and we would be mad not to attempt to correct it. This does mean that beside the original OpenSim we will be offering our own ZetaSim for our customers. Additionally we will make efforts to support other forks with the same intend of furthering the development of OpenSim and contribute some of the changes and fixes we have developed for ZetaSim. We feel that is the most liberal decision we could make given the current state of development and the need to support our customers as well.