Applying The Wrong Concept The Right Way

In the ongoing, albeit somewhat irregular, theme of technical posts we once again want to bring you a deep dive into a topic that is not often touched upon. This time the focus is on applying a microservice or clustering concept to a piece of software that really does its best toddler temper tantrum impression of not wanting to do its homework.

Microservices

In the world of “web applications” the containerization and clustering of applications through various concepts, layers and confusing config files is a landscape full of wonder and pretty explosions. For most, let’s call them websites from now on, since that is what they are, these setups are not all that useful, since they mostly apply to projects of vast scale. Nonetheless some still fall into the trap of pretty buzzwords and promised gains. Supposedly that is easier than to blame oneself for the code not being optimized or the hardware being overloaded as is. Microservices in most cases describe the concept of splitting a large application into smaller parts, each handling a specific task given the input, producing output. Going along with then clustering these across vast networks to more closely position them near the user and scaling them as markets grow or shrink. For large platforms with a thousands of users this makes the most economical sense, since the solution in the past was to simply slap the entire app onto every growing hardware, which just did not scale performance and cost all that evenly. Thus the concept of distributing load and splitting things into the smallest parts to make them more efficient has helped the internet grow and certain companies and platforms making billions while slashing their IT budget.

Load Balancing

Not a new concept by any means, but an every more important one these days. A single point of ingress for data into an application serving a wide range of potential sources means potential bottlenecks on the horizon. Equally then producing the output from that generally results in a cascade of ever slower processing until you hit the inevitable timeouts. Balancing this load through means of microservices or caching mechanisms is common practice not just in the world of websites. Any type of application, down to the very browser you are reading this through subscribe to the concept of load balancing in one way or another. At the core of the solution is spreading the load across any sort of multiplication that does not rely on other parts to process the data. In most programming languages this is known as asynchronous processing and generally tags along the object-orientated programming style that allows it to work in the first place. As a concept thus the idea is to allow all parts of an application to run and finish on their own time without causing the whole thing to grind to a halt, even if that, in the name of keeping the end results in sync, sometimes cannot be avoided either.

Where does OpenSim come into this though?

This is where it gets really interesting, because OpenSim has been built from the ground up to split individual processing into own parts that can run on their own. These individual services are often asynchronous as well and can even be split and distributed. This design allows both for applying the concept of microservices along with the load balancing that brings to it. However, that is easier said than done. As it turns out the interconnection between the services for the point of once in a while making sure all that asynchronous data actually makes any sense at all is not a straight forward affair. More so since changes and new features demand direct connections to other services that absolutely cannot wait for anything else to go on.

In the past there were attempts to resolve this by simply creating another process of OpenSim running as a sort-of backup to receive the same data and run it independently. Should the return then arrive faster than the main process, then it would be used instead. This went along with splitting services out into their own instances as well, but the resulting complexity and requirement to test each new change to not severely break the chain of data processing meant this project never really went anywhere beyond a working prototype.

That’s not to say the attempt itself did not emphasize the need to maintain the service-based setup of OpenSim. Thankfully for the lesser complex part of providing the main services that even connect the assortment of simulators to a conclusive world this has been maintained. What is commonly referred to as Robust services generally still has the ability to be split and even run as copies of each other. This leaves the door open for both applying the concept of microservices and load balancing to it. Though as already mentioned, there are a few things that managed to become rather large pitfalls to anyone looking to attempt it.

Robust, a simpleton with an attitude

To begin let’s go over the goals and requirements.

  • Split as many services contained in Robust into their own instances
  • For services with a potential to overload from data ingress or processing spawn multiple instances and distribute the load between them
  • Setup connections to each instance in a manner that allows for effective load balancing and reduces the complexity of setup for simulators connecting to them

To achieve these goals we can use a few methods already available, some which require a bit of tinkering and some external systems that without nothing would work. Let’s go over each part.

Robust

With the aforementioned splitting in mind the basic configuration file for a single Robust instance already has a list of services it contains as well as their definitions further down below. All we thus have to do here is to select the services we want to run in each instance and make sure in the end we have instances for all of them. However, rather quickly this idea gets thrown out the window when looking at the actual service definitions. The problem sits in the connection services have with each other. While a lot of them point to them via either a local service definition or external connector, there still exist some that flat out assume a copy of the service is running in the same instance. So the difficulty is now up a notch trying to find the services that have to go together to share data.

Connectors

Most connected services refer to other services via the direct connection established over the addins present as part of the Robust system. We can see these as DLL files describing each service. However, in order to allow for multiple instances to communicate or indeed other parts of the entire software to communicate there exist Connectors. These are also DLLs, but their setup is somewhat different in that they provide a remote-bound connection to a service not defined by the addin, but a URL. This means we can change our service definitions to these Connectors to allow them to connect to a service running in a different instance.

Siblings

Splitting everything up into pieces is one part of resolving issues created from overloaded services, but eventually even that is no longer enough to handle the influx of data. As such applying the idea of load balancing by way of creating copies of an application becomes a requirement. Unfortunately this presents an issue when we want to make sure the individual copies are still able to share data with other parts. Whether this be in the form of connecting multiple services to a single dependency or the other way round. This is where we have to resort to external software to provide a way to group multiple instances of the same service under a common umbrella through which we can establish connections with it.

Includes

When attempting to setup a vast array of instances, each requiring their own little changes to configuration to interconnect properly we quickly run into less of an issue, but more a case of not getting brain freeze in the process. Writing a full configuration for each node requiring hundreds of lines each time to create all the necessary information for it to run is tedious and can easily produce mistakes. Thankfully this is something that has already annoyed at least one person before and to our advantage this person has done something about it. Configurations are capable of loading data from files and combine them into a fully qualified instance configuration. This means we can configure each service for connecting locally and remotely and simply mix and match the required parts via the architecture includes. We can now simply select what an instance is meant to run as local service and what it should connect to remotely.

Hacking

This is where it gets complex. In order to reduce load created from asking the same questions over and over again some services rely on caches. These will cache a request for certain data allowing it to be delivered without the need to retrieval from data storage. Unfortunately these caches are localized to the specific service, if we attempt to then multiply this service there is a chance for cached data corrupting actual data entered on a sibling. To combat this issue we have to go deep into OpenSim, find the caches and either remove them entirely or change their behavior to not be in use when multiple instances of a service are being run. In this case the better and more compatible option is to look for each part of the code that either requests or enters data into the caches and change these actions to be dependent on a flag set to either allow them or not, with the latter defaulting back to retrieving or storing data to the database directly as if the cache had no entry for it.

Long Term

As changes to main parts of OpenSim are still being made in order to update some of the ancient standards used when it was originally conceived along with new features requiring additional code the long term stability of this is still in question. Changes already made to some parts do already cause some instability and require long term testing as well as further changes to mitigate. As such this setup will likely require further “hacking” and even changes to the setup itself to account for changing service relations. As of yet it is unclear whether changes to the service interrelations to retain or even enhance the ability to split each service will be made, but we certainly hope so. Increasing data sizes and ever more growth will test the infrastructure and the more a setup can be spread and load distributed among the parts the more solid it will be in the future. As with everything it requires testing and more testing and ever more testing to identify issues, but as OpenSim is still in development that is frankly a given constant already.

The gritty bits

Having completed the crash course in Robust setup let’s create a hypothetical situation realistic enough to warrant creating a solution for.

Say we have to deal with over 10000 users logging in throughout the day, each having thousands of items in their inventory and being an overly active member of the community, chatting and roaming the world with vigor. How do we handle the influx of hundreds of requests per second?

Let’s go over each part.

1. Nginx

Nginx is a webserver with load balancing capabilities through the use of a proxy setup. This sounds complicated, but is actually relatively easy. What we need to do is setup a hostname for each individual type of service we want to run instances of. Then we pass requests from these hostnames onto a set of instances by passing the request over the ports used by those instances. This takes the form of server definitions with a proxy pass to the upstream ports used by the instance.

An example:

We can do this for all instances, multiple or singular, passing everything over a central port, thus making configuration of simulator connections relatively easy. Nginx handles routing the requests in a somewhat round robin style. This means it is not directly aware of the load placed on each copy, but we are changing the receiver for each request onto a different copy, which is likely enough. If necessary we can always add more copies.

2. Robust

In order to make it easier to run a large number of copies instead of multiplying the binary as a whole we simply treat it as template to spawn copies from. This requires providing each instance with the information of where the configuration should be loaded from. We do this by adding the inifile parameter to the execution command pointing it at a single file containing the aforementioned definitions and includes.

An example:

Configuring each service as normal making sure to use the Connectors for the remote counterparts. As mentioned above this structure looks confusing at first, but is actually a lot less work to do as we simply combine what we need rather than writing the config sections out in each file. Organizing the local connectors for services included in the specific robust instance we configure and the remote ones to connect to other robust instances in folders to make it easier to see what’s what.

3. Simulators

Connecting a simulator to this setup is remarkably easy given the complexity of what it is connected to. For the most part we can use the hostnames to connect the simulator services to their Robust providers. Only on select services, GridInfo in particular, a more direct connection is required. This also goes for external asset servers, which we hope will become more common as they are the second biggest bottlenecks in OpenSim.

A rough example of the setup:

Configuration depends on how you set things up and what type of service and instance split is done. As mentioned we don’t need to worry about setting up specific ports for each service as the individual parts are proxied through to their respective endpoints already, which also handles balancing load. The identification is no longer the port, but the hostname itself.

4. Runtime Environment

This section is somewhat optional, but may be of value in the future. A big issue with setting up so many individual services is handling them in case restarts and changes are required. As we are dealing with a program that runs independently we can simply push it to the background and nuke it whenever a restart is desired, but this might incur data loss. A better solution is providing a separate runtime environment for each instance. This can easily be accomplished under Windows by simply stuffing the window into a corner and forgetting about it, but as Windows is not a recommended platform to run services such as OpenSim, in Linux this is a bit more difficult. It is possible to simply send the process away as mentioned, but there is no way to interact or get it back other than sending data to it, which gives us no feedback. The better option is to use runtime environments, which are plentiful on Linux, such as docker or LXC for containers or more simply things like “screen”. The latter provides “windows” we can select at will to interact with each instance and both send commands, but also view the process working. Which one of these works best depends on familiarity and what level of separation you want for each service.

The Grand Solution

To test this setup we have created a testing environment running this setup:

  • As previously mentioned all routed through Nginx to reduce the complexity of connecting simulators.
  • Each instance having only minor changes to its configuration in the realm of setting the port to use.
  • External and internal routing for service connections is also done with the proxy to reduce complexity of service interconnection and take advantage of full load balancing of all requests.
  • Configurations based primarily on includes rather than full configuration files reducing complexity and clutter
  • Spawning instances of a template binary to reduce complexity of upgrades
  • Retaining simple configuration for simulators to services without the need to specify individual ports
  • Splitting services logically based around encountered load and retaining services that have no ability to remotely connect to integrated services
  • Minimal changes to OpenSim itself

This is obviously not the solution to all potential problems and there is no guarantee future changes won’t break a setup as complex as this. Certainly we hope for the opposite since there is only so much a single fully qualified instance of Robust can do on its own and hitting that limit is not a pretty sight.

Applying the concepts of microservice and load balancing in regards to OpenSim may seem wrong and there are certainly many obstacles in the way of doing so, but the core of it was always part of the idea behind the service-based setup of Robust or even OpenSim as a whole. Thus these concepts can work for it as well, despite the issues that exist due to inter-service dependencies and caching setups. It most certainly has ways to go to truly embrace them, but it is already possible to observe the positive aspects. Whether it is a setup as complex and distributed as shown here or simply splitting out one or two, the future undoubtedly lies in utilizing them.

Next to the brief mention of this capability on the official OpenSim wiki and some snippets of configuration options that can be found on the web this marks the first time it has been fully documented and tested. In the pursuit to fully dissect and test it we share the interest with a number of people, who have provided information, time and effort in testing as well. It shows the strong community spirit often found associated with opensource projects and we hope to propagate this to anyone reading this article. Instrumental in kicking this project off by providing the initial basis of configuration options and pointers to information we want to thank Gimisa Cerise who has been pulling apart the hidden and complex inner workings of OpenSim for a long time now. Equally we have to extend thanks to the OpenSim team for providing assistance in tracking down interconnected modules. The continuous effort put into the OpenSim project by everyone involved makes things like this possible in the first place; their ongoing support and work toward the project drives it forward and we are happy to be a part of it and contributing where possible.

We will certainly continue testing and pushing the boundaries of OpenSim to make sure it is prepared for the future and hope this insight into the capabilities it has will provide some positive impact on the metaverse as a whole.

10 Years Of Performance

From the perspective of the average user the intricacies of software development may seem akin to magic or some form of science. The reality is often much less glamorous and filled with frustrations. OpenSim as software is now over 10 years old and in that time a lot has changed. From new features, improvements and most importantly bugfixes it has evolved to now support the continuous growth of so many communities. What was once a large team of enthusiasts has largely turned into a handful of individuals still working actively to enhance the software. Naturally development has slowed down and the focus has shifted somewhat to improving the existing features.

While bugs being squashed are an important part of development another is probably even more important. Performance. As the metaverse as a whole grows and the features demand ever more performance to deal with new shiny things such as mesh, whether you are greeted by a slideshow or a fluid movie is ever more important. This especially in times that culture shifts towards demanding engagement in video games and not just pretty pictures.

This is generally an incremental process, but OpenSim is more than the parts it is made up of. As software it relies on a framework to avoid re-inventing the wheel and handle common things without creating even more programming code. As this framework is still in active development it naturally improves as well, bringing new features, performance enhancements and new concepts to the table. The framework underneath is .NET/Mono, which is now entirely owned by Microsoft. Previously the Mono part was under the hands of Xamarin and independent. As project it was aimed to provide a .NET environment for the Linux kernel. Not that long ago Microsoft bought Xamarin however. What you take away from that acquisition is up to debate still, but it has so far brought some advancements to Mono.

OpenSim has somewhat embraced this change and eventually switched to newer versions of the Mono framework as the basis. Mainly driven by the new features and performance of Mono this change has brought many things with it. From the initially bumpy start most of the teething issues have been resolved to the joy of anyone now working with OpenSim. Adjustments to the code and application of new concepts has meant that OpenSim performance has improved significantly. Especially in the active development branch this improvement is quite striking to see. We recently saw this first-hand.

ZetaWorlds, our in-house grid, recently turned 8 years old and to celebrate that occasion a large party was held on one of its regions. This saw nearly 40 people attending the celebrations at some point, which does put a not insignificant load on the various aspects of OpenSim. It was thus rather reassuring to see the performance was not only stable, but in comparison to what it would have been 10 years ago, a lot better than expected. Short of staging an actual test to find the breaking point of it all it did serve to illustrate how far the improvements have come in a decade.

A measure of performance in OpenSim is generally bound to the frames per second the simulator can produce running the region itself. In an ideal scenario this would be 55 frames per second, which serves as the maximum and stable point. Any number lower than this would constitute a situation of backlog, where there are more things to process between each frame than can be reasonably achieve in the allocated time. Situations that produce a lower number can often cascade further essentially grinding everything to a halt. On the side of the actual hardware running OpenSim there is no real measure of performance, instead here we are looking for the amount of resources consumed by the instance of OpenSim. This is the same for any process running on a computer with memory and processor time consumption being the important metrics to monitor.

Not too long ago the general consensus was that each avatar on a region would consume 150-250 megabytes of memory and a good 10% processor load. As resource usage increased the likelihood of Opensim being unable to keep up handling all those resources and thus having to reduce the amount of frames it could produce per second generally happened around 30-40 avatars. This would often mean many gigabytes of memory usage and nearly filling most consumer grade processors of the time. So the following performance improvements are rather striking to see.

It goes without saying that this is an almost ideal situation of all avatars not engaging in a contest of who can throw the most physical objects at each other, which would likely not have had the same results. It is also important to mention that especially when loading a new avatar to a region, much like any significant change, there are times OpenSim chokes on the amount of data to process. This often results in temporary freezes, which resolve themselves, but have a negative impact on the average frame time as is clearly visible.

While there are still ways to go as is evident from the processor usage being pretty high, an event like this being possible and not causing anywhere near the expected resource usage is a massive improvement. From these metrics it seems reasonable to assume that even twice the amount of avatars should, given they are not trying to have a bumper-car session, be within the realm of possibilities.

Much like ZetaWorlds we are looking forward to what the future may hold and hope the quest to improve performance and reduce resource utilization will make even larger events possible in the future. We thank the community ZetaWorlds for making this, in a way unscheduled, performance test possible and obviously for celebrating with us the 8 years of success.

 

PayPal Automatic Payment [Updated]

PayPal is ever evolving, both in the good and bad directions. As you may have noticed, if you have selected automatic payments via PayPal, all these automatic payment plans have been cancelled by PayPal. The information given is sparse, but appears to point at the mismatch of the prorata billing cycle we use to create invoices on the first of the month rather than on the same day the service order is received. PayPal attempts to collect the automatic payments from the first of the month to the date of the order, which results in numerous failed attempts as funds are only released close to the order creation date. We believe PayPal has thus elected to cancel all automatic payments that received too many failures in the past. We have tried to reach out to PayPal for more information as to how to prevent this in the future, but are still awaiting response. For the time being we have disabled automatic payments through PayPal to avoid potential further issues from the failed attempts and to stop the creation of new automatic payments that will be cancelled immediately. We apologize for the inconvenience this may cause, we are working on a solution.

 

[Update 21.10.20]

We have heard back from PayPal with the information that the requests sent to their API contain the correct field for changing the prorata dates, but their API has bundled the calls into a new section. This means we have to wait for our billing system provider to adjust the gateway for PayPal to properly send the data in the correct format. As the changes are fairly recent it seems the update to the gateway is still in progress and the current version of the system does not yet contain the needed changes for prorata to work properly on subscriptions. We will continue to monitor this and have expressed our requirement for this system to work to the provider alongside many others expressing the same need for this change to be accounted for. We expect this to take a few more weeks to be fully resolved and will re-enable subscriptions as soon as we can verify the gateway is working properly for prorata subscriptions. We apologize for the inconvenience.

Service Provider Regulation For Enhanced Security

As an EU company we are bound to abide by the regulations set forward by the EU as implemented by the country of residence. In some cases these regulations are already reflected by local law and can even exceed the requirements set forward by the EU. In the latest round of regulations passed at the beginning of 2020 for enhancing the security of online banking and money transfer in general the EU has set forward a set of regulations that are supposed to make it more difficult to gain unauthorized access to banking accounts through a secondary form of verification.

In reality the implementation of these regulations falls short in a lot of areas and a lot of banking institutions or others handling money transactions are only partially following the regulations set forward. This is despite the clear outlines for what such an implementation requires to provide. Even more concerning is the nature of these implementations often disregarding the specifics of allowing the willful dismissal of participation in the additional verification step. In turn this results in pages and pages worth of complaints against such institutions for both making the steps mandatory despite the optional status and lack of complete adherence to the regulation in general, leaving, in the worst of cases, anyone with the inability to procure or provide verification without access to their funds. The latter being a downright case of discrimination against disabilities in a lot of cases.

We have elected not to follow these regulations, not that we need to as we do not handle sensitive banking information on our end. We do, however, see changes in how our customers pay for their services which are undoubtedly the result of how regulations are implemented with various payment providers we work with. While not strictly our responsibility and far out of our reach to even do something about, we still want to apologize for any inconvenience you may experience paying for your services. We have launched formal complaints with at least one of our payment providers in regards to their failure to properly implement these new EU regulations in a manner that both satisfies the guidelines set forth in the regulations themselves and the implementation abiding the disabilities act.

If you are still having issues with processing your payment with one of our payment processors please do not hesitate to contact us via ticket.

OpenSim Archive – A call for contributions

Zetamex Network is proud to be the sponsor of the OpenSim Archive, a project that aims to centralize a library of resource for and around OpenSimulator for the benefit of creators and those who want to become one. Based around the idea of sharing resources and knowledge, which is one of the core principle behind open source software and the entire FOSS movement, and not re-inventing the wheel for every car that’s made this library aims to aid content creators and everyone else alike. We hope the project will grow and creators will contribute to the enhancement of the worlds we spend our virtual life in. The project is made available to everyone free of charge and even includes things such as old software, compiled OpenSimulator binaries and other useful items. Zetamex Network provides the storage and bandwidth for this project and will handle submissions.

If you are a creator or know of publicly available resource with open licenses we ask for you to get in contact with us to contribute to the library. You can find a contact point on the right, just click the big yellow button or send us an email!

 

Expanding Reach – New Payment Gateways Updated Transaction Fees

The world is a very diverse place with lots of different attitudes to monetary transaction. Philosophical aspects aside the main interest for us as a company in that is the diversity of currencies and ways to turn said currency into goods and services. So in the pursuit of bringing not just more harmony to the world, but also provide easy of access to our services we have added new payment gateways. Additionally we have re-enabled an option on an already existing gateway and upgraded another. Without further ado here they are with the shiny new one leading the front:

Paysafe

Paysafe offers debit-like pins that can be purchased with a certain monetary value attached. These pins can then be used to pay for goods and services online if other payment methods are not available. They have been around for quite some years now and are often the only way for people without access to a bank account to pay for goods and services on the internet. Given they can be purchased with cash they are also quite popular with young people who may just get an allowance handed out in cash. We have partnered with Paysafe to bring this payment method to our brands so that more people can enjoy them.

Stripe SEPA

SEPA is a direct debit system that directly pulls money from the associated bank account. It is quite popular in Europe, because once setup, it is almost hassle-free and so long as the bank account has the appropriate funds on it customers don’t have to worry about their bills not being paid. With the worry of paying bills on time out of the way there is more time to focus on the important things in life.

PayPal Subscriptions

We had previously disabled this functionality due to problems with overcharges and the lack of flexibility in adjusting the subscription when services were changed. These problems have not completely gone away, but recent changes to our billing system have advanced detection of problems in this regard so we feel confident to enable this function once more. With PayPal Subscriptions customers can subscribe to their monthly bill and have it automatically paid without re-authorization. This, much like direct-debit, reduces the complexity of monthly payments for our customers.

 

Transaction Fees

A very important change that has come along with adding new payment gateways is that we are updating our transaction fees. Previously a flat 5% rate was applied to any invoice customers received. This is changing now for a new structure that applies different fees depending on the gateway used. This reflects the contractual conditions we have with the gateway providers and more accurately reflects the additional costs we incur for each gateway. Invoices are automatically updated when a different payment method is selected so customers can see directly which one is better for them. We understand due to many different circumstances some payment methods are less appealing than others so we have tried to reduce the transaction fees where possible. The new transaction fees are as follows:

 

Gateway Fee Gateway Fee Gateway Fee Gateway Fee Gateway Fee
PayPal 5% Stripe 3% Paysafe(new) 15% Stripe SEPA(new) 3% Bank Transfer 7%

 

Additional payment methods may come in the future, if customers have special requirements for payments they can of course contact us in this regard as always. We hope these new payment methods will provide customers with ways to reduce the hassle on monthly bills and provide more alternatives to those without access to the previously available methods.