Since yesterday I see exceptionally high ratio of failed or significantly delayed S3 operations. As you may know, uKeeper internals runs on AWS and make use of several services like S3, SQS, DynamoDB and some others. S3 especially important for distributed data retrieval process. Practically, all article-related resources like images and other attachments coming from S3.
S3 works as eventually-consistent storage, which means a simple thing - in some cases the data wrote to S3 not immediately visible to all readers, which is fine if readers expects such behavior and could wait enough time till updated data available. uKeeper one of such smart readers and worked this way since day one. Usually, the waiting period was 100-200ms, in very rare cases - 500ms. As a paranoid developer I put in place a waiting period for up to 15sec, but since Apr 2012 I have seen 4 cases only with unusually high latency (~3sec).
However, since yesterday S3 has been acting differently - relatively large part of submitted objects still not available for hours after the write! I have informed AWS support, and looks like they are trying to fix it. I see number of such incidents decreasing dramatically and for last 12 hours I got just 4 delayed writes.
On uKeeper side this issue initially caused “request rejected” to some users. As soon as problem was detected I put in place a hot-fix allows to process articles even if one of resources failed / delayed by S3. In this case user will get an article, but it may have a missing picture. Please note - this is really, really rare case now and hopefully AWS will get it fixed completely very soon.
From this indecent I learned a few important things about “what to do if AWS acting strange” and going to implement a new set of backup strategies for cases like this.
UPD: 01/05 13:57 CDT – The problem with S3 was resolved completely.
- Added ability to catch pages in PDF format. Adding %p to subject will do the trick.
- Added ability to catch pages in image format (png) – %i to subject
- Added new, regexp-based rules for deeper url match. It allows to tweak parsing for the different parts of the same site if needed.
- Added detection and extraction of embedded youtube video
- Added support for non-latin URLs
- Implemented special parsing for twitter’s conversations
- Added support for twipic images and surrounding text
- Fixed incorrect detection of image type in Evernote
- Fixed issue with relative image URL’s without leading /
- Fixed issue with urls with “+” in web catcher
- Improved instapaper & pocket forwarding
- Header shows expanded url, not the short one
both PDF and PNG generators based on wkhtmltopdf
- Added link forwarding for instapaper and pocket destinations
- Fixed extraction of multiple tags
- Improved processing of binary attachments
- Preview for cases with multiple URL matches
- More space-related URL’s fixes
- Some other improvements, mostly performance related
- Workaround for Evernote’s issue caused pdf dups.
- Added special parser and link converter for google-groups.
- Added ability to catch pages/links preprocessed by instapaper or readabilty.
- Better support for title meta-tag. In some cases should address duplicated titles and generally makes title extraction smarter and more predictable.
- Fixed incorrect tag’s detection from subj with links (empty body case).
- Improved XMPP link detection for adium client.
- Fixed parsing of urls with “+”
- Fixed processing of links with relative path like “../../”
- Proper cleaning of unicode’s non-breakable spaces in urls
- Added new jquery selector-syntax for the new custom parser
- Added support of custom titles for selectors-based parser
- Special support for new, dynamic blogger’s themes
- Improved resiliency for most of S3-based operations
- End-to-end fully automatic monitoring systems for regular and full capturing
- Fixed processing of twits with image only and with internal redirects’
- Fixed reporting and base processing for links with hash character
- Migrated to latest stable Netty 3.5.5
- More fixes for problematic redirects
- Simplified download tasks manager
- Custom parsing settings available for more testers
- Added support of multiple links via jabber (up to 10 links in a single message)
- Added ability to disable extra redirects for some sites, forced login in such cases
- Proper support for existing custom rules in preview-rules/edit-rules forms
- Implemented better concurrency for page preview (custom parser)
- Optimized downloading of page resources (images and others embedded elements)
- Better support of reply-to filed
- Fixed an issue caused usage of email alias instead of correct email in some rare cases
- Improved extraction of pages with multiple redirects
- Fixed processing of parametrized and dynamic images
Internal improvements, mostly around custom parsing.
- Added ability to change User-Agent for custom sites. For instance it allows to switch (in case of need) to mobile version.
- Added generic url-converter for custom url mapping. This is useful for tricky sites where the article link should be derived.
- Excluded element in custom parsers can be attribute value in addition to class name and id.
- DKIM was disabled for now, seems to make some sort of incomplete signature
- Development migrated to IntelliJ IDEA
- Added optional multi-link support. To turn it on put ’’ at the begining of your subject.
- Added DKIM (DomainKeys Identified Mail) singature to all ukeeper’s emails. Should make our emails treated better by anti-spam filters.
- Added support of basic authorization for incoming links. It still a very bad idea to email your user:password, but if you really have to - ukeeper will process it.
- For the most forgetful users - if you didn’t put any link, either in subject or body, ukeeper will try to do a basic search for you, based on provided subject.
- Added support of user-defined parsing rules. Not actived yet for most of users, just for a few selected beta-testers.
- Imporved detection and processing/expanding of short urls.
- Added unlimited depth for parents in custom parser’s mathcing.
- Added support of aliases for the most popular email services.
- Some minor outlook-related fixes.
- improved processing of short links. For some cases such redirect caused encoding lost and prevented custom parsing.
- added special extraction for dzone.com links
- improved detection of slow pages and implemented timeout auto-adjustment
- fixed a rare problem with incorrect title for persistent subject mode
- preparation for user-defined parsing parameters with live preview
- We have migrated to uservoice for all type of ticketing, support and sharing new ideas. Feel free to hit us with your bright ideas and sad bug reports.
- Report error link will open a simple form where you can add description and your name (optional). This form integrated with our new ticketing system.
- Online chat proved to be just a useless toy - removed.