I was invited to join a meeting about implement a friendfeed like feed system. Here are some ideas about requirement and architecture, which I typed on my BlackBerry during the meeting.
- Like the friendfeed, The product can import external RSS, so we separate the system into two parts. The rss crawl system and the feed pubsub system. The pubsub system has no responsbility to grab the feed from source. The feed itself only save feed summary and url.
- We decided to use the INBOX approach, which will push the published feed to be saved in all subscriber’s data table. More information of how this work can refer to Scaling a Microblogging Service – Part I.
- User’s homepage is an aggregation result. We choose to return a limited recent real-time date, no infinity pagination. But the user’s own feed(user’s profile page) may have a bigger date range.
- The unsubscibe logic have two choice, delete or keep the history data from one’s inbox. We decide to keep them.
- If the feed source had been deleted, do we need to delete all references in all subscriber’s inbox? If need delete, each feed push to the pubsub system need to have a unique resource id. Another problem is after the source updated whether to publish a new feed or update the current feed?
- How to manage the group(QUN in Chinese) feed, deliver to all member’s inbox? Or share a group inbox?
- How to impl the feed comment logic, publish the comment to feed system or design a standalone comment system. We prefer to use a standalone comment system which doesn’t publish the comment back to the feed system.
- Every feed has a media type, such as text, video, image so the subscribe API can only retrieve a limited media type (text for mobile device). And a feed may have tags.
- The read/unread count is easy to implement. But the load is heavy. (QQ / QQzone may has such logic.)
- Need open API for 3rd party client(like twitter client), and RSS feed, may have OAuth integration.
- The storage may like friendfeed’s mysql schema (see How FriendFeed uses MySQL to store schema-less data) or use Amazon simpledb.
- May add support for XMPP Publish-Subscribe, or PEP(Personal Eventing Protocol) for pushing realtime time to users in the future.