Scroll down to the script below, click on any sentence (including terminal blocks!) to jump to that spot in the video!
SymfonyCon 2019 Amsterdam presentation by Andre Rømcke.
Symfony Cache has been around for a few releases. But what is happening behind the scenes? Talk focuses on how is it working, down to detail level on Redis for things like datatypes, Redis Cluster sharding logic, how it differs from Memcached and more.
Hopefully you’ll learn how you can make sure to get optimal performance, what opportunities exists, and which pitfalls to try to avoid.
So quite a bit. So I'll just do a quick recap on what it is for those that haven't. Ah cache tagging, what that this anyone using that already? Yeah, there's a few, not that many of us with some, uh, there are quite a lot of Symfony applications using it under the hood, you might not actually know it, um, shortly on Memcached versus Redis just uh, for awareness Redis and Redis cluster what it is. Um, and some issues we had over the last year in the Redis adaptor and what ended up with the
RedisTagAwareAapter and then we'll do, uh, I'll do a short demo, um, just to show how you can add caching to your application. Nothing fancy there. And then I'll go through some more advanced edge cases to be aware of if you do this and also where the new adapter fits to the old one.
Um, so me, uh, working for eZ Systems, uh, been doing that for a long time. Try whenever I have the time to contribute to the Symfony FOS. Uh, especially FOS HTTP cache. Um, well only a Composer, long time ago. Um, PHP-FIG long time ago, Docker Compose, um, eZ is a kind of a global system, small company, 75 people spread around with the community and partners. Uh, beyond that, uh, we make, um, we make a CMS, uh, today called eZ Platform in the past called eZ Publish. You might have heard of it. It's a very extendible, very feature rich, flexible ah it can be used for headless or full, uh, full, uh, use. Uh, we've been on Symfony since 2012 and, uh, we're actually in the middle of uh, um, getting a new version out soon with either, either on 44.4 LTS or 5 or both. We haven't really decided, um, we have commercial flavors of that eZ Platform Enterprise and Commerce, which adds additional features.
Um, so in our case, we started to use Symfony cache back in 2007. Uh, we used a Stash if you've ever heard about that. Um, pull that out, put in Symfony cache, uh, to instead use Symfony, uh, sort of the cache tagging there because we wanted to take a different approach to how we clear cache. If you are aware of a Stash, it has something called a hierarchical cache. So we can for instance, in for instance, say I want to delete locations slash and it will delete all locations. Uh, if that's an entity, uh, if you have a blog, you can delete all blogs, but if you have something that goes across all these trees or hierarchies, then becomes a bit more, uh, difficult. And you also end up clearing way too much cache in some cases when this gets complex. So after moving to that, we found some, some issues. So going back to that, and that's all sort of already mentioned, we in the end ah decided to contribute optimized tag adapters for Redis file system. So quick recap on Symfony cache for those that haven't used it yet. Uh, PSR6 compliant cache adaptors plus plus. So it adds additional features on top and needs to be fast. Um, I can sign on that compared to Stash. It's definitely faster.
Um, it's also progressively being used all over the place in Symfony itself. So you'll probably see new places being used for that.
There are a lot of adopters right now. This is maybe not even complete, but there's APCu a PHP Array in memory chaining several using Doctrine, um, existing adopters from them. There's FileSystem, there's PHP file and PHP Array using Opcache for those cases where you have cache that never changes. So, um, uh, there's that. And then there's a proxy for using other PSR6 caches. There's Redis and there's Memcached. And finally there is something called TagAware. So this is kind of a special kind. This is not a backend per se. This is something that wraps, uh, a backend. And to go a bit into tagging, um, what do we use that for? How do we use it and to try to be generic? You can say, depending on an application, it can be a entity type, it can be a placement somehow. If you have some kind of placement in your system for your entities, variant type in commerce systems maybe. So a lot of different things depending on your system.
And where this is relevant is if you have operations in your application that affects, uh, quite a big subset of those. Not the whole thing, but definitely a part like the change to a variation affecting all the variations in that specific kind of variation, for instance. So affecting bulk of entities but not all. So about tagging cache, uh, you can basically directly invalidate on this here. Jumping a bit to an eZ example, we have the key, uh, for eZ content. In our case we have, ah content can be anything, articles, blogs, whatever. So there's ID 66, there's type two. If, um, and that's a article, I think a location 44. So tagging where in the content tree we place this so that if there is an operation on location 44 and everything beneath it, we can say to the cache, please clear this, please clear path, actually 44.
So to jump to PHP code and look at how this looks like, actually it's quite simple. There's just one method added to what PSR6 provides. There is invalidate tags but of course in the detail there, it does imply a lot of concepts. It doesn't imply a secondary index, it does imply keeping tracking of this and does imply setting those tags or well if you want. Um, as mentioned it wraps your normal adapter and stores the tags in a separate key which you don't have to care about. It's completely internal. Um, it does, when you look up your key, it will do a separate in parallel lookup for to know which tags you have added to this cache. And after knowing that it will do a second look up or third, I'll show you in a second to to actually know if any of those have been validated. So if we look at actually output from web profiler who will see it first, do one round of lookups, to the backend, it will look up what you asked for. It will look up for which tags is assigned to this. And then it will look up all the tags you have assigned in this case. Um, it's a timestamp. So in this case there's false, there's nothing set yet. I just booted up the application. But if there were, you know, relations, it would return a timestamp.
We'll return a bit to the new adapters around this. But first, what about Redis and Redis Cluster? Anyone here uses Redis or maybe specifically Redis Cluster already. Okay. Anyone uses other things like from Netflix or other solutions?
Um, so starting with Redis, it's not Memcache, it's way more advanced. It has, um, data types. So it has a lot of different data types. It's not just string. There's lists, sets, so list within a key, sorted sets, hashes, bitmaps, hyper log logs, which are not going to try to explain it this evening, streams, which is a complex talk in itself. Uh, there are, um, several talks about it. You can find online if you're interested, quite recent future. And on these data types, there are tons of operations, so it's not just get, set and almost that's it. There is tons more, if we talk about string, there's GET SET, APPEND, BITPOS, Decrement Increment and so on. If it's an ink, no, int value in the string. Um, there's also a generic, um, cluster commands for doing specific things. There's a transactional.
There's um, also something called the pipeline for doing several in one call. Uh, and for SET, which is what we use in
RedisTagAware there is add to this app. So add a member, remove a member, pop the last member, uh, dif when you have two sets, uh, in do intersections, unions, moving members from one set to another. So there is a lot of advanced operations that you can do on this. You can use this, uh, for something cool in your application without it actually being a cache. It can be a storage. I'll get back to that. Um, and also, um, if you still compare it to Redis and now to Memcached, it allows you to control much more what happens on a lot of different cases. One important one is what happens when it reaches max memory. So when it reaches maximum memory, can ask it if it should randomly clear things, if it should, uh, respect TTL, if it should respect a last, um, rather than, um, recently used object or frequently used object. Uh, and you can choose if that should only be on those that have a TTL or everything, those even that doesn't have it.
So onto Redis Cluster, um, this allows you to scale Redis by running several instances. You can also run several on the same server. So if you want to handle more load than you have a lot of spare CPU and memory, you can even do it per server.
Uh, what it does for you is coordinate a cache across cache slots, which is within each server and deal with the replication and also figure out who is the master. So if something goes down, it will communicate with the different servers, elect a new master in the background and deal with this itself. Um, thinking back to all the different advanced options I talked about earlier with Redis, there are some limitations when you use a cluster unlike Memcached. But of course, uh, it, it supports everything Memcached supports also in cluster mode. So talking about limitations, you can, for instance, not do a pipeline that would, um, have to go to one server.
Uh, in the case of our clients are the one, most of the use, the PHP Redis one. It mainly supports doing several operations at once using simpler commands like MGET and MSET. Um, and to give an example, uh, if you do a rename of a key from one name to another, this might result in the following, uh, exception cross slot keys in request on the hash to the same slot. There is a way around this I'll show you later. But, um, it's things to be aware of. You need to kind of work around those things.
Shortly on Memcached versus Redis, whoever uses Memcached. Okay. Where do you used to use it and move to Redis? Yeah, yeah. There is a lot of those, including me. Um, they're there. It's good to know what the difference are because actually Memcached has its strengths also compared to Redis.
Um, to, to talk about them here. I already talked about the strengths of Redis. You have the data types, you have the controller eviction, you have persistence, you have the pipeline Lua capabilities. Um, and there is also a bunch of other things, but those are the ones I felt were important. Um, and then when it comes to multi server, uh, you start to see the difference because um, Memcached is by default meant for running on several servers. While Redis you have to introduce a different thing, right? Redis Cluster or other technologies to run it across several servers. So that's one difference. The second one is multithreading Memcached does multithreading especially in the last five, six, seven years. Um, while Redis today, today it's a single threaded. So this means if you want to take, as I mentioned earlier, take care about extra load, uh, and want to take care about and use a lot of CPU, CPU, of course, you would need to use Redis Cluster to spawn additional processes to take advantage of those.
Redis 6 does come with some background threading capabilities but it's more for the slow operations. Okay. So having set somehow some uh, some background and some context here. Let's talk about the new adapters. Um, there's two of them.
RedisTagAwareAdapter. What they do on the high level is basically instead of like the
TagAwareAdapter, where it does several look-ups, it moves tags to be like a relation. So instead of using the expiry time and looking up that it will store it as a separate thing and not have to do a lookup on it, this means it has a one round trip to the server instead of two.
In the case of file system adapter, it uses a file for tags. So kind of like a upended the file every time there's something added, appending to it. If there is invalidation, I think it moves it and then reads it, clear all the keys in it. Uh, in Redis, um,
TagAwareAdapter, it was, um, it is using a set, so storing that as a relation and it uses it with out expiring. This is, um, one of the tricky things with this adapter it needs to do that because uh, if the tags are suddenly evicted before the cache itself, if you clear it by the tag and that has been evicted, you basically end up with stale cache. So clear limitation may be I'm not that up there. So all this efforts to try to do one, um, one round trip lookups? Uh, we have cases with customers where they're on Amazon or something else and there are often quite some latency to reach the cache backend if it's Memcached or if it's redis.
Uh, it can be the same. What we had from customers was some were in the range of 0.2 to 0.5 milliseconds per lookup. Um, and if you are on a small instance of that Redis uh, or the ElasticCache, it will be even slower. So simple page, no problem. That might attribute to something like five to 20 millisecond of the total, uh, time spent to generate that page. But um, us, me and Yonnie in the front there, um, coming from eZ we make a CMS, there's like complex news papers or something that our customers tries to load, you know, cases like this or worse where there's like thousands of articles or different kinds of content being shown on one page. All of them need some look up to the cache and this starting to attribute to a lot of the load time, uh, unless the page is cached by Varnish. So whenever varnish isn't caching the page just to look ups to Redis or memcached will attribute to up to one second.
And that's just with one Redis instance, it was in some cases worse. So to talk briefly on the optimizations that's been done in, in Symfony Cache over the last year, we first figured out that it was using pipeline. So when you move to Redis cluster, it wouldn't have to then do, if you were asking for five things at the same time, it would do five calls. So instead of using MGET, it ended up being a single GET per in a foreach for each and everyone. So back and forth ping pong to the Redis server. So this was fixed.
There was also a small thing around having to do versioning of cache in Redis Cluster, this, uh, added additional lookups to know which version the caches is on now. And then if you clear everything, the version will be bumped. And this wasn't needed anymore. When Nicholas did some other, uh, improvements here and there and he fixed this one. He fixed a lot of things, uh, also on the
TagAwareAdapter, it's, it's way better now. It does some micro TTL caching. So if you do a lookup on those cache to know the expire time, it doesn't have to look that up every single time during your request. So that's a great improvement as well. Um, but a lot of the improvements you need to deal with is actually going to be on your application.
So in our case on the eZ Platform side, we did change this to make sure we could take advantage of looking up several things at the same time. Instead of us having our API where you load one thing at the same time and our, our users are using that, we expose more APIs, to load several things at the same time. We took advantage of it ourself in the cases we could and we started to like encourage our users to use it also. Um, we also then more importantly introduced, um,
RedisTagAwareAdapter and maybe what had a biggest impact for us. We introduced application specific in-memory cache, um, kind of discussing Nicholas about trying to build something generic for Symfony around this. But it's, it is application specific. It's only you and your application that knows what can safely be put in memory for a short amount of time. So yeah, it's possible to do it but then be aware of where you can do it. In our case, we do it, um, for maybe our second on metadata lookups, things that doesn't change too often and for anything that is related to the entities, so the content in our case, which can update quite often, we have a very, very short, um, 40, 60 millisecond burst cache just to make sure if there's an inefficient code on top looking up the same data several times, we don't have to ask Redis about that.
So to talk about end result example, in our case we had like a page or a dashboard doing um, 17,000 lookups, to Symfony cache, um, and on Redis cluster due to the issues I talked about before, that was up to 40 to 60,000 lookups. So we're talking like around 30 second wait time for, for the user. The editor is sitting there and waiting to load the dashboard. Only 30 seconds was just for Redis and then additional 200 milliseconds, again or something just to do the PHP execution. Um, after all those fixes I talked about, we went down to 63, so today it, it's not even showing up almost on the, on the profiler, on your running in dev mode. So it's, it's really a great improvement. And if we hadn't done it, uh, we wouldn't be able to launch a couple of larger customers stuff we've done in the last half year. So it was definitely necessary. Okay. Um, before I'm going on to edge cases and some things to be aware of, I'll um, briefly do our small demo. I'll need to switch screen.
So anyone here tried out Symfony demo or looked at it or used it or, yeah, some nodding. It's a quite simple application. It kind of has blogs and comments on that and some, um, editorial tags and it has a admin backend to let you log in and edit this. And it's all on Symfony, pure Symfony. So, if we, um, if we start the server, make sure that we have cleared cache to make sure that it's correct. And if we, okay. And that's on the right. Yeah.
So clearing cache again.
So, um, first load now it will basically, you can see that doing a, at the bottom there is, is it readable? No. Yeah, Better? Let's do that here also. Okay. So you see at the bottom there, there's three lookups to the database and 174 lookups to the cache. Actually, it's, if we look at it, it should be a lot of save requests, GET items, save, GET items time, save, save the third. Yeah. And actually it's annotations in Symfony itself doing most of that. So you can see we have 14 look-ups and done by the cache backend. And then if I go back now you can see now it's loading again. There is no database lookups, and now there's 23 lookups to cache. So what did I do to, ah, to demo application, if we have a look in the code here, let's zoom in, you might not see the tree on the side here, but basically in demo there's a
BlogController for the front and a
BlogController for admin. Is this one readable? Nope
Nope. Uh, so what's been done here? Uh, imported a couple of, um, uh, interfaces, uh,
ItemInterface specific thing here and then you'll see it later. But
TagAwareCacheInterface is the most important thing because now, and this is both from contracts by the way, um, now we can do a change to, to the controllers. We can on the
IndexController. We can just type in that we want
TagAwareCacheInterface and then get the whole cache adapter being already wrapped in this tag aware stuff. Um, and we can start to use it. Um, so changes to the controller here is then to generate a key that is unique to every request coming in, page index, which page we are on. If it's a tag filter. So the editorial tag shows you in the admin or the demo. Although this needs to be added to the key to make sure we look up what is unique to that, to the page. We check if it is a hit and it's not.
So we need to do the loading. Um, this is then code, that was there from before. Um, get the latest post in the end, store it, here, and then what is beyond normal PSR6 is this part, setting the tags. So I put it into a separate function here now to make it a bit more readable, but basically adding one global tag for the list itself and I'll show you why later. And then also for each and every post that is going to be displayed on this page, add the tag with an ID of that post. So return an array of tags and this is being stored together with the cache. And we now have a relation, kind of not a strong one, not a foreign key checked one or anything like that, but we have some kind of a relation. So, but if I now try to edit this post, will it work? So we have the first here. Let's just do something simple, edit, you have the tags there for the editorial part, we saved it and then backend, the edit shows up but not in front end. So I need to do something more. I need to also take care about the admin side of things. So doing that,
So to go over what has been changed. If you go to
BlogController again and now I need to zoom, yeah,
TagAwareCacheInterface, same as before and now index not so important. We don't need to cache things in admin. Maybe. New; when a new
Post is created we type in that we want a cache and in this case when there is a valid change, we need to clear the index because the ID is new. If we clear on the ID and the right, um, the right ones won't be cleared. So we cleared the index to being sure that whoever's page this ends up on it will kind of be be there. And secondly, when there is an edit, same thing type in the
TagAwareCacheInterface and in this case we can safely invalidate on the one with the ID. So in this case,
'blog-post' . $Post->getId() and affecting just the caches where this one showed up. Um, there is actually a bug in this and that is if it's changes to those that are old tags and suddenly this page shows up, on this and this blog post shows up on a different, uh, list. It's, it's not going to show up before the cache, expires. But yeah, that's the life of caching.
Uh, so now we have check this out and we should, I think, be able to just edit again.
So now updated successfully and, it shows up. So that's the basics of using cache tagging. Um, the last thing I wanted to show briefly was how you can just switch to the new adapters. So at the end of this, so I'm pulling in the Symfony 4 here to use the latest version and in the end, configuring
App to be a new service, setting up a new service and setting it to the new class. And I want them on this because it will just fail. So I'll save you that time. Um, but I'll talk a bit about the downsides and the things you need to think about around this.
So, yeah, I already mentioned one cache bug there and there can be others. So some things to be aware of around caching, um, race conditions. Uh, it's a common one. Um, you have, um, for instance, in order to load your entity you might have to do two look ups to the backend first to figure out, okay, what's the latest version or something like that. And then secondly, do you look up on that? Those kinds of situation can be prone to race conditions. For instance, there are many other cases if in between those who calls someone publishes a new version, it will look up the wrong version and your system might act wrongly then by using that.
When caching data. So kind of the case I was showing here, if you, if you are using transactions and using complex things going on in those transactions, you'll have to deal with that. Hopefully we can see if we can find some solution in Symfony cache to make it simpler. But what we opted to do is kind of like disabled caching during transactions and make sure that we're not sending updates to the servers when we are within the transaction, but kind of just a store, whatever should be changed. So when you hit that the last commit, we commit all the changes to cache that should happen. So without doing that end up with a lot of strange stale cache situations.
And then there's async and stale cache. Um, if you used Varnish, yeah, are probably aware of that, but it's typically if you, you forget to clear something, uh, or if you actually do it on purpose, um, then you need to be aware how to deal with it. Um, then the adapter, so we have the
RedisTagAwareAdapter, it has a requirement due to the non expiry on the tags to use volatile, um, cache eviction or no eviction, which is the standard in release. So this means on the pro side, it has just one round trip to the server on the negative side, including consumes more memory one. And two, you risk of running out of memory because it will not be able to clear those tags. So that's kind of a problem for some. And then maybe this adapter is not for you or you need to put much more memory to Redis.
So comparing that to
RedisAdapter, the one that we had for quite some time now, uh, uses less memory. All the memory can be freed. You can use whatever eviction strategy you want. A negative side, it does two look ups and done
MemcachedAdapter compared to this. Again, even less memory. Um, because of the just one data type and probably other reasons it, it's more efficient on how it stores things. All can be freed and at least, in smaller setups it's capable of handling more traffic normally because of the multi tread in nature. But that's a, not everyone agrees with that. On the con side two look ups to, to do when you use it with tagging. So this is all in regards to, in the context of when you use tagging, if you don't use tagging, there will just be one lookup. So now and then you haven't, don't need the
RedisTagAwareAdapter in the first place.
Lastly, some details on the
RedisTagAwareAdapter just like
RedisAdapter it uses MGET to look up everything it needs. So one call, spread it around on all servers when you're on Redis cluster. So in parallel and on the invalidation in this, a few things, um, here showing how the command will look on the command line, it does a rename and then the existing name and then it does the curly brackets there. So that's the kind of the trick it needs to do to avoid, to hit that cross slot issue I talked about earlier because now we telling the client that, Hey, we want to put it on the same slot as the, that key in the first place, but we add additional, uh, suffix, um, to make sure this now our unique one that we can deal with. Specifically, this temp is a unique cache. So there is no, there's not one string. There is a, so we can safely now use this on our process without interference from other processes. We can read the numbers and we can delete. You can delete that, the set itself and the keys within the set, so then the members. So we can safely do this all the way until it's done without interference from other processes and yup. Done. And that is the last, Oh, you're one.
Um, someone wanted to have a picture. Uh, this will be online by the way. That's okay. Yeah. Okay. So that's the end. Any questions around any of this? Yeah, I'm sorry. Are there any limitations on the tags? I mean, maybe there is a how many you mean? Yeah. Or how long they can be. Yeah. The, the limitation of the set is 4 billion members in Symfony 4.3. We had, um, limit in PHP due to, we used s pop, uh, to for 2 billion, but now, now we're using it, uh, as it is. So it's 4 billion. Okay. Any more questions come by me later? I think everyone is ready for lunch. Thank you everyone.