uKeeper - from link to full page

Jul 12th, 2012

Added ability to use custom parsers in FULL mode
Improved compatibility with some websites rejected ukeeper’s image requests
Correct processing for image/resource’s url with mixed slash and backslash
Custom parser made smarter and can detect first match
Added support of emails with “+”
More fixes for empty titles
Fixed web-ukeeper for urls with “&”

Jul 6th, 2012

improved accuracy of Outlook’s links processing. In some cases silly Outlook decided to split your link.
support of exotic charsets simplified and unified.
resolved problems with content and subject in different encodings in forced ! mode.
in case if page doesn’t have any title and title can’t be extracted from the content uKeeper adds site’s domain as a title.
resolved problem with non-latin urls, with special characters.
increased timeout for data extraction to allow slow sites and/or big requests to be processed.
fixed processing issue for some links with double-quotes.
added ability to customize article extraction in the new parser on the fly.

Jun 29th, 2012

Fixed a bug prevented processing of short, case-sensitive links, introduced by 0.12.2
Added support for some upper-case links supposed to be lower-case
Added smarter processing for Twitter statuses. uKeeper will try to extract link from the status, if presented.
Added ability to implement custom preprocessors for predefined sites. It will allow all sort of interesting tricks, for example I will able to request for maps sites print-friendly version instead of the regular so-hard-to-get-data-from.

Jun 28th, 2012

Added a new way to send links - via jabber. See Extras “uKeeper via Jabber”. Not fully polished yet, but works.
Fixed problem with “(” and “)” in links prevented correct processing.
Added ability to capture youtube links.

Jun 20th, 2012

This is sort of significant update with massive improvements in data extraction. Version 0.12.1 added several different methods of article detection and it combines all of these method to get clean and complete page.

Improved accuracy of article’s detection and extraction.
Added ability to process multi-page articles.
Added support for the new binary formats - doc, xls, ppt, vsd, vst, zip, gz, tgz.
Added ability to adjust extraction rules manually. It should allow almost instant fix for reported problems.
Added multi-step analysis of extracted data with automatic tunning, if necessary.
Smarter detection of article’s title.
Added initial content extraction from emailed tweets. So far tested just on a few iOS clients. Your feedback will help to make it right.
Fixed an issue with broken formating for some extracted pages.
Fixed an issue with partial articles for some links.
Improved processing of non-article/index pages, like search results and so on.
Fixed procession of links with spaces.
Fixed incorrect support link in ukeeper’s error email.
Fixed lack of error email in case if response too big and can not be send.
Disabled support for multi-line links until I have some smart way to handle it without affecting pure, single-line links with some content below.

Jun 15th, 2012

Added initial version of very experimental article extractor. For now you can try it by adding “%” at the start of the subj. It’s buggy and fresh, but has a great potential.
Resolved issue with double-quotes in the encoding info sent by some crazy servers. In rare cases it caused “reject email” from uKeeper
Resolved problem with multi-line links. Seems like it was mostly outlook’s thing.
Optimized downloading and processing of embedded pictures. The same image won’t be processed multiple times and won’t be attached multiple times anymore.
In case, if header’s encoding different from meta info, ukeeper will try to autodetect it based on html, and if this detection match header or meta - this one will be used.
Added automatic code deployment to all worker’s nodes
Created a template for the new worker’s node and automation for quick initial build.

Jun 10th, 2012

responses migrated to SQS + S3. This allows more reliable delivery and higher redundancy. Also it makes simpler to scale ukeeper out.
added support for partially defined charset and some wrong, but widely used aliases.
improved binary email for images.
added auto-tuning for workers and several protections against excessive load.
added protection against multiple submission of the same request by the same user in short period of time.
fixed an issue with multiple error messages sent out in some rare cases.
message polling made less aggressive and more efficient.
other minor fixes

Jun 7th, 2012

Significantly improved memory utilization and implemented smart allocation with prediction mechanism. In practice it allows to process links faster and have much better level of concurrency.
Resolved problem with compatibility of some pages with gmail.
Core processors fully switched to highly-efficient NIO reading.
Added protection against double-posting. It was theoretically possible in some emergency situations.
User’s DB migrated to high-performance DynamoDB storage with local caching.
Primary queues, holding requests arrived via email or web, isolated to SQS queues with delayed request removal. This will increase resilience and guarantee delivery even if all ukeeper’s back-end servers unavailable at the some moment.
Special treatment for incorrectly formated content-types

Jun 6th, 2012

For last two weeks we had two crash-events. This is totally unacceptable, even keeping in mind the current beta-status of uKeeper. The service was not supposed to fail in any circumstances and has to be a rock-solid and highly reliable. Both problems were caused by some unexpected events leaded to extremely high memory utilization and, in result – kernel termination of some uKeeper’s processes.

Jun 2nd, 2012

Added encoding autodetection for pages and sites without any sign of charset info by using juniversalchardet library
Fixed rejection of some SSL pages with non-trusted certificates
Fixed case-sensitive detection of user’s email mistakenly treated [email protected] and [email protected] as two different users
Improved performance of HTML processing
Update of user’s info now immediately available for WEB clipping (bookmarklet/extension) and won’t give “user not found” anymore
Better handling of some twitter links as well as links inside a div
Empty subject will not cause lack of header in ukeeper’s email anymore

← Older Blog Archives Newer →

Processing