加速Plone之降龙十八掌——之五

加速Plone之降龙十八掌——之五
Setting up squid
● More tips:
– While debugging your squid configuration, run squid from the command line and echo errors to the console:
● /usr/sbin/squid -d1
– To stop squid from the command line, use
● /usr/sbin/squid -k kill
– To reconfigure squid after modifying squid.conf, use:
● /usr/sbin/squid -k reconfigure
Setting up squid
● More tips:
– Look at squid's logs if you have problems
● /var/log/squid/cache.log – squid messages about its internal state
– If you notice all squid's external processes are dying, it probably means that you have a problem with your python path in
iRedirector.py or squidAcl.py
– Try running these python files from the command line to see what's going on. Use “./iRedirector.py”, NOT “python Redirector.py”
● /var/log/squid/access.log – squid messages about cache hits and misses
Setting up squid
● Tips:
– iRedirector.py does URL rewriting
– Uses redirector_class.py as a helper
● Both iRedirector.py and redirector.py do debug logging
● Edit them and replace “debug = 0” with “debug = 1” if you have problems Setting up squid
● Once you have squid working, It Just Works
● Setup can be a headache the first time
– Tips should help a lot
Configuring CacheFu for Squid
● Once squid runs, tell Zope about it
● Go to first pane of Cache configuration tool
– Indicate URLs of your site
● include all URLs, e.g. http://www.mysite.com,https://www.mysite.com, http://mysite.com, etc
– If squid behind apache, URL of squid (typically http://localhost:3128)
Vary header and gzipping
● Set the Vary header (default should be OK)
– Vary header tells squid to store different versions of content depending on the values of the headers
specified
– Vary: Accept-Encoding for gzip
● One version for browsers that accept gzipped content
● One version for those that don't
● Select gzipping method (default is recommended)
– Gzipping cuts down network latency
– Content cached in gzipped form so only gzip once
Demo
● Let's try it out!
● Tips:
– Use LiveHTTPHeaders to see if getting cache hits
– Look at headers:
● X-Cache: HIT or X-Cache: MISS
– If you don't see any HITs, clear your browser cache
manually and try again
– If that fails, something may be wrong
Strategy 3: Load Balancing
● Zope Enterprise Objects let you do load balancing
– ZEO server = essentially an object database
– ZEO client executes your python scripts, serves up
your content, etc
– ZEO comes with Zope
● Set up multiple ZEO clients on multiple machines or multiple processors (single instance of Zope won't take much advantage of multiple processors)
Setting up ZEO
● You can transform a Zope site into a ZEO site using the mkzeoinstance.py script in ~Zope/bin
● Change a few lines in ~instance/etc/zope.conf and ~instance/etc/zeo.conf and you are good to go
● See Definitive Guide to Plone, Chapter 14
http://docs.neuroinf.de/PloneBook/ch14.rst
Squid + ZEO
● Main idea: give your proxy cache lots of places from which to get content it can't serve
● Squid can in theory take care of load balancing
● I would use pound instead
– pound = load-balancing proxy designed for Zope
http://www.apsis.ch/pound/
– Put pound between squid and ZEO clients
– Big advantage if you use sessions – pound keeps client talking to same back-end server
Resource requirements
● My site: 20K page views/day
– 1 squid instance, 1 ZEO client
– 2.4 GHz P4 + 1G RAM
● plone.org:
– 1 squid instance + 2 ZEO clients
– 2x 3GHz Xeon box with 2 GB of RAM
● Bulk of load is from authenticated clients
● Don't need that much power, especially if most clients are anonymous
● squid is very efficient
● Main requirement is lots of memory for Zope
Strategy 4: Use Entity Tags
● ETags let us do smart browser caching
● The idea:
– ETag = arbitrary string, should have the property:
● If I have 2 files with same ETag, files should be the same
– Send an ETag to browser with a page
– Browser caches the page
– Before rendering from cache, browser sends ETag of cached page to server
– Server responds with Status 304 + no page (meaning cached stuff OK) or Status 200 + new page ETags
● What are good ETags?
– Depends on what we are serving up
● Example: Images
– 2 images with same URL and same modification time are probably the same
– ETag for images, files can just be last modified time
– ETags not really useful for files and images, since we can do a conditional request based on modification time ETags
● Example: document
– ETag for document should include modification time
● That lets us distinguish different versions of the doc
– Should depend on authenticated member
● Since we have personalization in document view
– Should depend on state of the navtree, other portlets
Setting ETags
● CacheFu provides an easy way to generate ETags
● Go to policy for Plone content in Cache
configuration portlet
– Look at ETag section
– Ingredients for building an Etag
● Use member ID (personalization)
● Time of last catalog modification (covers age of document + navtree state)
● REQUEST vars: month, year, orig_query (covers state of calendar portlet)
● Time out after 3600 secs
ETags
● ETags useful for 2 things
– First, allows for smart conditional browser caching
● If document changes or something in document's containing folder changes or calendar changes or logged in member
changes, ETag will change
– Second, provides a useful cache key for a RAM cache
PageCacheManager
● PageCacheManager stores full pages + headers in a memory
– Uses ETags as cache key, so ETag is required
– ETags are set using CachingPolicyManager policy
● If template uses Cache configuration tool to generate an ETag and policy is not “Do not cache”
● CacheFu automatically associates templates that have ETags generated
● Content views automatically cached in memory PageCacheManager
● Try it out
● Look for X-Pagecache: HIT
Things you should know
● Some things to watch out for when digging deeper
– If browser has a page in hand, will do a conditional
GET
● GET /foo
● If-None-Match: ETAG-OF-PAGE-IN-HAND
● If-Modified-Since: LAST-MOD-OF-PAGE-IN-HAND
– Squid can handle If-Modified-Since but is too dumb to deal with If-None-Match
– Any requests with an If-None-Match bypass squid
● Code in squidAcl.py is used to do this More things you should know
● Squid is not typically very useful for caching
content from authenticated users
– squidAcl.py causes squid to be bypassed if the user is authenticated
● Squid IS useful for caching images and files even if user is authenticated
– Code in squid.conf that tells squid to always use the cache for files ending with .js, .css, .jpg, etc
More things you should know
● Images and Files get routed through
CachingPolicyManager through a nasty method
– Monkey patch associates them with DefaultCache
– DefaultCache is an HTTPPolicyCacheManager
● Existing caching policies assume that images and files do not have any security on them and are the same for authenticated and anonymous users
– May be be possible to work around but will require some effort
Strategy 5: Optimize Your Code
● Don't guess about what to optimize – use a profiler
● Several available
– Zope Profiler:
http://www.dieter.handshake.de/pyprojects/zope/
– Call Profiler:
http://zope.org/Members/richard/CallProfiler
– Page Template Profiler:
http://zope.org/Members/guido_w/PTProfiler
● Identify and focus on slowest macros / calls Code Optimization: Example
● Suppose you find that a portlet is your bottleneck
– Calendar portlet, for example, is pretty expensive
● How to fix?
● Idea: don't update calendar portlet every hit
– Update, say, every hour
– Cache the result in memory
– Serve up the cached result
● Similar idea applies to other possible bottlenecks:
– Cache the most expensive pieces of your pages RAMCacheManager
● RAMCacheManager is a standard Zope product
● Caches results of associated templates / scripts in memory
● Caveats:
– Can't cache persistent objects
– Can't cache macros
● Calendar portlet is a macro – how can we cache it?
Trick: Caching Macro Output
● Idea:
– create a template that renders the macro
– output of template is snippet of HTML, i.e. a string
– cache output of the template
Caching the Calendar
● Step 1: Create a template called cache_calendar.pt:
<metal:macro use-macro=”here/portlet_calendar/macros/portlet” />
● Step 2: In the ZMI, add a RAMCacheManager to your site root
● Step 3: in the RAMCacheManager, set the REQUEST variables to AUTHENTICATED_USER, leave the others as defaults (this caches one calendar per user)
Caching the Calendar
● Step 4: Associate cache_calendar.pt with your new RAMCacheManager. Output of cache_calendar.pt will now be cached for 1 hour.
● Step 5: In your site's properties tab, replace here/portlet_calendar/macros/portlet with here/cache_calendar
● Voila!
● Use RAMCacheManager to cache output of slow scripts, etc.
设置