uKeeper - from link to full page

Jan 4th, 2013

Since yesterday I see exceptionally high ratio of failed or significantly delayed S3 operations. As you may know, uKeeper internals runs on AWS and make use of several services like S3, SQS, DynamoDB and some others. S3 especially important for distributed data retrieval process. Practically, all article-related resources like images and other attachments coming from S3.

S3 works as eventually-consistent storage, which means a simple thing - in some cases the data wrote to S3 not immediately visible to all readers, which is fine if readers expects such behavior and could wait enough time till updated data available. uKeeper one of such smart readers and worked this way since day one. Usually, the waiting period was 100-200ms, in very rare cases - 500ms. As a paranoid developer I put in place a waiting period for up to 15sec, but since Apr 2012 I have seen 4 cases only with unusually high latency (~3sec).

However, since yesterday S3 has been acting differently - relatively large part of submitted objects still not available for hours after the write! I have informed AWS support, and looks like they are trying to fix it. I see number of such incidents decreasing dramatically and for last 12 hours I got just 4 delayed writes.

On uKeeper side this issue initially caused “request rejected” to some users. As soon as problem was detected I put in place a hot-fix allows to process articles even if one of resources failed / delayed by S3. In this case user will get an article, but it may have a missing picture. Please note - this is really, really rare case now and hopefully AWS will get it fixed completely very soon.

From this indecent I learned a few important things about “what to do if AWS acting strange” and going to implement a new set of backup strategies for cases like this.

UPD: 01/05 13:57 CDT – The problem with S3 was resolved completely.

Dec 31st, 2012

Added ability to catch pages in PDF format. Adding %p to subject will do the trick.
Added ability to catch pages in image format (png) – %i to subject
Added new, regexp-based rules for deeper url match. It allows to tweak parsing for the different parts of the same site if needed.
Added detection and extraction of embedded youtube video
Added support for non-latin URLs
Implemented special parsing for twitter’s conversations
Added support for twipic images and surrounding text
Fixed incorrect detection of image type in Evernote
Fixed issue with relative image URL’s without leading /
Fixed issue with urls with “+” in web catcher
Improved instapaper & pocket forwarding
Header shows expanded url, not the short one

both PDF and PNG generators based on wkhtmltopdf

Oct 31st, 2012

Added link forwarding for instapaper and pocket destinations
Fixed extraction of multiple tags
Improved processing of binary attachments
Preview for cases with multiple URL matches
More space-related URL’s fixes
Some other improvements, mostly performance related

Oct 16th, 2012

Workaround for Evernote’s issue caused pdf dups.
Added special parser and link converter for google-groups.
Added ability to catch pages/links preprocessed by instapaper or readabilty.
Better support for title meta-tag. In some cases should address duplicated titles and generally makes title extraction smarter and more predictable.
Fixed incorrect tag’s detection from subj with links (empty body case).
Improved XMPP link detection for adium client.
Fixed parsing of urls with “+”
Fixed processing of links with relative path like “../../”
Proper cleaning of unicode’s non-breakable spaces in urls

Sep 24th, 2012

Added new jquery selector-syntax for the new custom parser
Added support of custom titles for selectors-based parser
Special support for new, dynamic blogger’s themes
Improved resiliency for most of S3-based operations
End-to-end fully automatic monitoring systems for regular and full capturing
Fixed processing of twits with image only and with internal redirects’
Fixed reporting and base processing for links with hash character
Migrated to latest stable Netty 3.5.5
More fixes for problematic redirects
Simplified download tasks manager
Custom parsing settings available for more testers

Sep 2nd, 2012

Added support of multiple links via jabber (up to 10 links in a single message)
Added ability to disable extra redirects for some sites, forced login in such cases
Proper support for existing custom rules in preview-rules/edit-rules forms
Implemented better concurrency for page preview (custom parser)
Optimized downloading of page resources (images and others embedded elements)
Better support of reply-to filed
Fixed an issue caused usage of email alias instead of correct email in some rare cases
Improved extraction of pages with multiple redirects
Fixed processing of parametrized and dynamic images

Aug 14th, 2012

Internal improvements, mostly around custom parsing.

Added ability to change User-Agent for custom sites. For instance it allows to switch (in case of need) to mobile version.
Added generic url-converter for custom url mapping. This is useful for tricky sites where the article link should be derived.
Excluded element in custom parsers can be attribute value in addition to class name and id.
DKIM was disabled for now, seems to make some sort of incomplete signature
Development migrated to IntelliJ IDEA

Aug 7th, 2012

Added optional multi-link support. To turn it on put ’^’ at the begining of your subject.
Added DKIM (DomainKeys Identified Mail) singature to all ukeeper’s emails. Should make our emails treated better by anti-spam filters.
Added support of basic authorization for incoming links. It still a very bad idea to email your user:password, but if you really have to - ukeeper will process it.
For the most forgetful users - if you didn’t put any link, either in subject or body, ukeeper will try to do a basic search for you, based on provided subject.
Added support of user-defined parsing rules. Not actived yet for most of users, just for a few selected beta-testers.
Imporved detection and processing/expanding of short urls.
Added unlimited depth for parents in custom parser’s mathcing.
Added support of aliases for the most popular email services.
Some minor outlook-related fixes.

Jul 20th, 2012

improved processing of short links. For some cases such redirect caused encoding lost and prevented custom parsing.
added special extraction for dzone.com links
improved detection of slow pages and implemented timeout auto-adjustment
fixed a rare problem with incorrect title for persistent subject mode
preparation for user-defined parsing parameters with live preview

Jul 13th, 2012

We have migrated to uservoice for all type of ticketing, support and sharing new ideas. Feel free to hit us with your bright ideas and sad bug reports.
Report error link will open a simple form where you can add description and your name (optional). This form integrated with our new ticketing system.
Online chat proved to be just a useless toy - removed.

← Older Blog Archives Newer →

Processing