Tutorial: Creating a Divx's Stage6 broadcast media provider
===========================================================

Let's say we will be writing a MediaProvider, to provide Elisa the ability to
play the videos found on Divx's Stage6 broadcasting site.

We will begin importing the Elisa modules we will need:

.. code-block:: python

    from elisa.base_components.media_provider import MediaProvider, UriNotMonitorable
    from elisa.core.media_uri import MediaUri
    from elisa.core.utils import deferred_action
    from elisa.core import common

We import here the MediaProvider class which our component will inherit from.
The MediaUri class represents an Elisa URI in the form:
scheme://[user:password@][domain][:port][/path][#fragment][?params]

'deffered_action' permits us to manage a queue of actions. Through the 'common'
module, we can access to Elisa's Manager, such as MediaManager.

Next we import Twisted related modules. Elisa architecture and API relies
a lot on the use of Twisted, mostly to force non-blocking function calls.
We also import the BeautifulSoup module, to facilitate us the parsing of
Stage6 HTML pages:

.. code-block:: python

    from twisted.internet import defer, threads
    from twisted.internet import reactor
    from twisted.web import client
    
    from BeautifulSoup import BeautifulSoup

Next we will import a few Python stardard components:

.. code-block:: python

    import urllib2
    import re
    from mutex import mutex
  

In order to retrieve the media available through Stage6 website 
(http://stage6.divx.com), we will need to parse the HTML pages for tags and 
video URLs. We will create a small StageParser class to achieve that goal:

### ben: Just a question: shouldn't every plugin class better inherit from
component, so that they can always use the logger for output instead doing the
things that are currently done in the 'get_tags' of this class? If so we
should say it here!


.. code-block:: python

    class StageParser():

    """
    This class implements a parser to retrieve video titles and
    URL from a Stage6 HTML page
    """

    # Some regexps that will help retrieve the data we are looking
    # for in HTML pages
    reg_href = re.compile("href=\"(.*)\"")
    reg_img = re.compile("img alt=\"(.*)\" src=\"(.*)\"")
    reg_title = re.compile("title=\"(.*)\"")
    reg_video_id = re.compile("video/(.*)/")

    def __init__(self, string_to_parse):
        """
        @param string_to_parse:         the HTML code to parse
        @type string_to_parse:          string
        """
        
        self._to_parse = string_to_parse


    def get_tags(self):
        """
        Returns a list of tags as strings. This is parsing the
        HTML data we have in self._to_parse

        @rtype: list of strings
        """
        
        tags = []
        # In case the Stage6 website is having two many connections
        could_not_connect = '<!-- Could not connect to MySQL host: Too many connections -->'
        if self._to_parse.startswith(could_not_connect):
            tags.append("Could not connect")
            return tags

        # BeautifulSoup is going to help us find the code we're looking for
        b = BeautifulSoup(self._to_parse)
        res = b.findAll('ul', attrs={'class': 'tags-drill'})

        if len(res):
            # Tag names are between <li> marks
            res = res[0].findAllNext('li')
            for tag in res:
                t = tag.contents[0]
                for i in t.attrs:
                    if i[0] == 'class':
                        title = t.attrs[0][1] 
                        tags.append(title)
                        break

        return tags

    def get_videos(self):
        """
        Returns a list of videos, with their name, URL and thumbnail. This is parsing the
        HTML data we have in self._to_parse

        @rtype : list of (string, string, string)
        """

        videos = []

        # Video info are between <div> marks with their class='video-title'
        b = BeautifulSoup(self._to_parse)
        res = b.findAll('div', attrs={'class': 'video-title'})

        count = 0
        if len(res):
            res = res[0].findAllNext('div')
            for i in res:
                if count == 1: # BeautfulSoup gives 2 times the same item in our case
                    count = 0
                    continue

                line = str(i)
                # get the href to retrieve video id
                match = self.reg_href.search(line)
                # retrieve the thumbnail location
                img = self.reg_img.search(line)
                if match and len(match.groups()):
                    # retrieve video id hidden in /user/name/videos/video_id/
                    video_id = self.reg_video_id.search(match.groups()[0])
                    
                    if video_id and len(video_id.groups()):
                        title = self.reg_title.search(line)
                        if title and len(title.groups()):                            
                            if img and len(img.groups()) > 1:
                                img = img.groups()[1]
                            else:
                                img = ''
                            # Finally we add the video id, its title and thumbnail location
                            videos.append((video_id.groups()[0],
                                           title.groups()[0],
                                           img))
                            count = 1

        return videos

As you can see, this class is totally dependent on the HTML code of the Stage6
website. If the pages on this web site changes, the parsing code will have to be
modified.

This class accepts a string containing HTML code in its constructor.
Two functions are then available to retrieve the information :

- get_tags(): it will return a string containing the video tags found in the HTML code.

- get_videos(): returns a list of tuple containing : the video id, the name of the video, and its acompanying thumbnail.

Now that we have our parser ready, we can start working on our StageMedia
media Provider itself:

.. code-block:: python

    class StageMedia(MediaProvider):
        """
        This class implements stage6 video website browsing support
        http://stage6.divx.com
        """

        # URL where we are going to look for video tags
        TAGS_URL = "http://stage6.divx.com/videos/"
        # URL where the videos are actually located
        VIDEOS_URL = "stage6://video.stage6.com/"

        def __init__(self):
            """
            We init here also the base class. Caching of directory listng
            will done in the self._cache dict, protected from concurrent
            access by a mutex
            """

            MediaProvider.__init__(self)

            # We create a cache of retrieved results
            self._cache = {}
            self._mutex = mutex()

            # Here we create a DeferredActionManager, which permits us
            # to manage a queue of deferred actions. This is useful        
            # for providers which uses a data protocol which can take a
            # lot of resources, in order to have only one request at a time
            self._def_action = deferred_action.DeferredActionsManager()


Here we explicitely inherit from the MediaProvider class. We will have to 
respect the API it defines. We also create the DeferredActionManager, which
as explained is going to permit us to do one request at a time on Stage6 
website.

.. code-block:: python

    def scannable_uri_schemes__get(self):
        """
        Retrieve the URI schemes that can be scanned by the
        media_scanner. Since media scanning can be an heavy and long
        task the MediaProvider developer can choose to make the
        media_scanner skip URIs with scheme not appearing in returned
        list.
        """

        # We do not need media scanning. We can provide the metadata ourselves
        return {}
                
    def supported_uri_schemes__get(self):
        """
        Retrieve the URI schemes supported by the provider, for each
        scheme there's a priority. Higher priority == 0 means the
        provider will always be used to read data from a given scheme.

        @rtype: dict mapping URI schemes (strings) to priorities
                (positive integers)
        """

        # our URIs will appear as stage6://
        return { 'stage6': 0 }


This function will be called by the MediaManager to know which MediaProvider
can handle an URI it has to process.
        

.. code-block:: python
       
    def get_media_type(self, uri):
        """
        Try to guess the maximum information from the media located
        at given uri by looking at eventual file extension. Will
        return something like::

          {'file_type': string (values: 'directory', 'picture', 'audio', 'video'),
           'mime_type': string (example: 'audio/mpeg' for .mp3 uris. can be
                               empty string if unguessable)
           }
        
        @param uri: the URI to analyze
        @type uri:  L{elisa.core.media_uri.MediaUri}
        @rtype:     dict
        """


        # This function is non-blocking, thus we call our
        # function self._get_media_type() in a deferred
        # that we return, through our DeferredActionManager
        return self._def_action.insert_action(0, self._get_media_type, uri)


    def _get_media_type(self, uri):        

        # If the uri starts with the stage6 video domain
        # name, we know it is a video. Otherwise it is
        # considered as a directory
        if repr(uri).startswith(self.VIDEOS_URL):
            return { 'file_type' : 'video',
                     'mime_type' : '' }
        else:
            return { 'file_type' : 'directory',
                     'mime_type' : '' }

The get_media_type() function can be called by an Activity to know which type
of file it is dealing with. This is used mainly to know how to organize the 
media (by music, by videos...) and their structure (tree of directories and 
media)


.. code-block:: python


    def is_directory(self, uri):
        """
        return True if a directory
        
        @param uri: the URI to analyze
        @type uri:  L{elisa.core.media_uri.MediaUri}
        @rtype:     bool
        """

        return self._def_action.insert_action(0, self._is_directory, uri)

    def _is_directory(self, uri):

        # if the uri doesn't start with the stage6 video domain,
        # we know it is a directory
        return not repr(uri).startswith(self.VIDEOS_URL)


    def has_children(self, uri):
        """
        Detect whether the given uri has children. Implies it's a
        directory as well.

        @param uri:      the URI to scan
        @type uri:       L{elisa.core.media_uri.MediaUri}
        @rtype:          L{twisted.internet.defer.Deferred}        
        """

        return self._def_action.insert_action(0, self._has_children, uri)

    def _has_children(self, uri):

        # We can consider that a video tag we have found on the stage6
        # always have videos linked to.
        return self.is_directory(uri)


    def has_children_with_types(self, uri, media_types):
        """
        Detect whether the given uri has children for given media
        types which can be of 'directory', 'audio', 'video',
        'picture'. Implies the URI is a directory as well.

        @param uri:         the URI to scan
        @type uri:          L{elisa.core.media_uri.MediaUri}
        @param media_types: the media_types to look for on the directory
        @type media_types:  list of strings
        @rtype:             L{twisted.internet.defer.Deferred}
        """
        return self._def_action.insert_action(0, self._has_children_with_types, uri, media_types)


    def _has_children_with_types(self, uri, media_types):

        if 'video' in media_types:
            return self._is_directory(uri)
        else:
            return False

#### ben: Stop! We need a small document which explains the URI-Idea and their
handling. What is a directory, what a children? This things are need to
understand that part and develop it correctly!

Those functions, as the previous one, are mostly used by Activities and View to 
know how to organize and represent the data they are dealing with.


.. code-block:: python


    def _get_cached_uri(self, uri, children, add_info):
        """
        Return the list of children from a parent URI,
        or None if this URI has not yet been cached
        """

        self._mutex.testandset()

        ret = None
        # If we have the uri cached, return it
        if self._cache.has_key(repr(uri)):
            for i in self._cache[repr(uri)]:
                if add_info:
                    children.append(i)
                else:
                    children.append(i[0])
            ret = children
            
        self._mutex.unlock()
        
        return ret

            
    def _add_to_cache(self, parent, child, info):
        """
        Attach a child to a parent in the cache
        """

        self._mutex.testandset()
        
        parent = repr(parent)
        if not self._cache.has_key(parent):
            self._cache[parent] = [(child, info) ,]
        else:
            self._cache[parent].append((child, info))
            
        self._mutex.unlock() 


Those two functions will permit us to add and retrieve the tags and videos 
from a cache. By using a cache we make it possible to do not have to download
and parse again the HTML pages if we already did it before.


.. code-block:: python

    def get_real_uri(self, uri):
        """
        Returns the original uri (acesable) from a virtual
        uri representation.

        @param uri:     the URI to validate
        @type uri:      L{elisa.core.media_uri.MediaUri}
        @rtype:         L{elisa.core.media_uri.MediaUri}
        """

        # At this point we need to convert our internal stage6
        # uri to the real http uri that can be used to play a video
        # Fortunately, we just have to change the scheme.
        http = MediaUri(uri)
        http.scheme = 'http'
        return http

The get_real_uri() function is called by Activities when they need to access
to the real media data. Our stage6:// scheme is virtual, GStreamer cannot
understand and access to the media through it. At the end, the Stage6 videos
are http URLs. So we convert our URI in this function so that an Activity can
actually play the media. That means, that we have to output a uri that is
accessable with gstreamer!


.. code-block:: python

    def _read_url(self, url):
        """
        Read an URI and return its content
        """
        
        f = urllib2.urlopen(url)
        return f.read()

    
    def _retrieve_children(self, uri, list_of_children, add_info=False):
        """
        retrieve the children of uri and fills list

        @param uri:                     the URI to analyze
        @type uri:                      L{elisa.core.media_uri.MediaUri}
        @param list_of_children:        List where the children will be appended
        @type list_of_children:         list
        @param add_info:                Add also the thumbnails to the list
        @type add_info:                 bool
        """

        # If the uri requested is in the cache, we return the cached children
        cache = self._get_cached_uri(uri, list_of_children, add_info)
        if cache:
            self.debug('Loaded from cache: %s' % repr(uri))
            return cache

        # if the uri path is /, we have to retrieve the tags from the main stage6 page
        if uri.path == '/':
            # We retrieve the HTML page
            to_parse = self._read_url(self.TAGS_URL)

            # create the parser and retrieve the tags
            parser = StageParser(to_parse)
            tags = parser.get_tags()

            # We add to the children list a MediaUri representing each tag
            for tag in tags:
                t = MediaUri("stage6:///" + tag)
                if add_info:
                    list_of_children.append((t, {}))
                else:
                    list_of_children.append(t)
                # Cache the uri
                self._add_to_cache(uri, t, {})

        # we have a list of tags
        elif len(uri.path):
            path = uri.path[1:] # Remove first slash
            # replace from format tag1/tag2 to tag1+tag2
            path = self.TAGS_URL + path.replace("/", "+")

            # download HTML page and parse it to retrieve the video list
            to_parse = self._read_url(path)
            parser = StageParser(to_parse)
            videos = parser.get_videos()

            # We add to the children list a MediaUri representing each video
            for v in videos:
                t = MediaUri(self.VIDEOS_URL + v[0] + "/.avi")                                
                
                label = v[1].decode("utf-8")
                # set the uri label to the name of the video
                t.label = label
                
                if add_info:
                    # Add the thumbnail url to the info dict
                    d = { 'thumbnail': v[2] }
                    list_of_children.append((t, d))
                    # Cache the uri
                    self._add_to_cache(uri, t, d)
                else:
                    list_of_children.append(t)
                    # Cache the uri
                    self._add_to_cache(uri, t, {})

        return list_of_children


    def get_direct_children(self, uri, l):
        """
        Scan the data located at given uri and return a deferred.
        Fills list_of_children. Defferred is called when the
        gathering is finished.

        Typemap of filled result:

          [ 
             uri:media_uri.MediaUri,
            ...
          ]
        
        @param uri:                     the URI to analyze
        @type uri:                      L{elisa.core.media_uri.MediaUri}
        @param list_of_children:        List where the children will be appended
        @type list_of_children:         list
        @rtype:                         twisted.internet.deferred
        """

        return self._def_action.insert_action(0, self._retrieve_children, uri, l)


This function is called by Activities to retrieve the content of a directory.
The call is also non-blockable, thus returns a deferred and calls the 
_retrieve_children() function in another context.


.. code-block:: python

    def get_direct_children_with_info(self, uri, children_with_info):
        """
        Scan the data located at given uri and return a deferred.
        Fills children_with_info. Defferred is called when the
        gathering is finished with children_with_info as parameter

        Typemap of filled result:

          [ 
             (uri : media_uri.MediaUri,
              additional info: dict),
            ...
          ]
        
        @param uri:                     the URI to analyze
        @type uri:                      L{elisa.core.media_uri.MediaUri}
        @param children_with_info:      List where the children will be appended
        @type children_with_info:       list
        @rtype:                         twisted.internet.deferred
        """

        # Same as get_direct_children() except we also fill an information dict
        return self._def_action.insert_action(0, self._retrieve_children, uri, 
                                              children_with_info, add_info=True)        
         
         
This is the same function as get_direct_children() excepts you can add a 
dictionnary of metadata for each URI. In our case, we return the thumbnail
of each video.


.. code-block:: python

    def next_location(self, uri, root=None):
        """
        Return the uri just next to given uri and record it to history

        @param uri:             the URI representing the <file or directory from 
                                where to move on
        @type uri:              L{elisa.core.media_uri.MediaUri}
        @param root:            root URI
        @type root:             L{elisa.core.media_uri.MediaUri}
        @rtype:                 L{twisted.internet.defer.Deferred}                        
        """
        return self._def_action.insert_action(0, self._next_location, uri, root=root)

 
    def _next_location(self, uri, root=None):

        if not root:
            root_str = MediaUri(u'stage6:///')
        else:
            root_str = repr(root)
            
        to_find = repr(uri)
        # is it cached ?
        if self._cache.has_key(root_str):
            for child, children in self._cache.iteritems():
                # look if it is a child of root
                if child.startswith(root_str):
                    i = 0
                    while i < len(self._cache[child]):
                        # Is that our uri ?
                        if to_find == self._cache[child][i]:
                            # Check if there is a uri following
                            i += 1
                            if i < len(self._cache[child]):
                                # if yes, returns it
                                return MediaUri(self._cache[child][i])
                            break
                        i += 1
            
        return None

This function is useful for example for playlists, it returns the URI next to the URI
passed by parameter, from the cache.


.. code-block:: python

        
    def uri_is_monitorable(self, uri):
        """
        Check if the uri is monitorable for modification
        
        @param uri: the URI representing the file or directory for
                    which we would like to know if it is monitorable or not
        @type uri:  L{elisa.core.media_uri.MediaUri}
        @rtype:     bool
        """
        
        # We cannot monitor the uri for a change
        return False


    def uri_is_monitored(self, uri):
        """
        Check if the uri is currently monitored for modification
        
        @param uri: the URI representing the file or directory for
                    which we would like to know if it is currently
                    monitored or not
        @type uri:  L{elisa.core.media_uri.MediaUri}
        @rtype:     bool
        """

        # Always cannot be monitored
        return False

We do not monitor Stage6 URIs for addition/removal/modification. This is useful
for local filesystem support through INotify interface for example.


.. code-block:: python

    def gst_source_element__get(self):
        """
        Provide a GStreamer source element that supports data
        retrieving for the uri schemes supported by the MediaProvider

        @rtype: L{gst.BaseSrc}
        """
        # We do not need one
        return None

In case you can provide a GStreamer source element for it to playback your URI
scheme, this is the place to do so.

### ben: Currently this Method is NOT used! We have to speak about the need
and implementation about it. We should mark it in the documentation also!


.. code-block:: python

    def open(self, uri, mode=None, block=True):
        """
        Open an uri and return MediaFile file if the block keyword
        parameter is True. Else we return a deferred which will be
        trigerred when the media_file has been successfully opened.

        @param uri:     the URI to open
        @type uri:      L{elisa.core.media_uri.MediaUri}
        @param mode:    how to open the file -- see manual of builtin open()
        @type mode:     string or None
        @param block:   should the caller wait for media file operation
        @type block:    bool
        @rtype:         L{elisa.core.media_file.MediaFile}
        """

        # We cannot open 'tags'
        if self.is_directory(uri):
            return None
        
        # What we do here is convert the uri in its http form,
        # and ask the media_manager to provide a suitable component
        # - such as GnomeVFSProvider - to do the work for us
        uri = self.get_real_uri(uri)
        media_manager = common.application.media_manager
        if media_manager.enabled:
            media = media_manager.open(uri, mode, block)
        else:
            media = None
        return media
      
        
    # The close(), seek() and read() functions are handled by the
    # base class MediaProvider

    def copy(self, orig_uri, dest_uri, recursive=False):
        """
        Copy one location to another. If both URIs represent a
        directory and recursive flag is set to True I will recursively
        copy the directory to the destination URI.

        @param orig_uri:  the URI to copy, can represent either a directory or
                          a file
        @type orig_uri:   L{elisa.core.media_uri.MediaUri}
        @param dest_uri:  the destination URI, can represent either a directory
                          or a file
        @type dest_uri:   L{elisa.core.media_uri.MediaUri}
        @param recursive: if orig_uri represents a directory, should I copy it
                          recursively to dest_uri?
        @type recursive:  bool
        @rtype:           L{twisted.internet.defer.Deferred}        
        """
        # What we do here is convert the uri in its http form,
        # and ask the media_manager to provide a suitable component
        # - such as GnomeVFSProvider - to do the work for us
        uri = self.get_real_uri(uri)
        media_manager = common.application.media_manager
        if media_manager.enabled:
            media = media_manager.copy(orig_uri, dest_uri, recursive)
        else:
            media = None
        return media


    def move(self, orig_uri, dest_uri):
        """
        Move data located at given URI to another URI. If orig_uri
        represents a directory it will recusively be moved to
        dest_uri. In the case where orig_uri is a directory, dest_uri
        can't be a file.

        @param orig_uri: the URI to move, can represent either a directory or
                         a file
        @type orig_uri:  L{elisa.core.media_uri.MediaUri}
        @param dest_uri: the destination URI, can represent either a directory
                         or a file
        @type dest_uri:  L{elisa.core.media_uri.MediaUri}
        @rtype:          L{twisted.internet.defer.Deferred}
        """
        # What we do here is convert the uri in its http form,
        # and ask the media_manager to provide a suitable component
        # - such as GnomeVFSProvider - to do the work for us
        uri = self.get_real_uri(uri)
        media_manager = common.application.media_manager
        if media_manager.enabled:
            media = media_manager.move(orig_uri, dest_uri)
        else:
            media = None
        return media


    def delete(self, uri, recursive=False):
        """
        Delete a resource located at given URI. If that URI represents
        a directory and recursive flag is set to True I will
        recursively remove the directory.

        @param uri:       the URI representing the file or directory for
                          which we would like to know if it is currently
                          monitored or not
        @type uri:        L{elisa.core.media_uri.MediaUri}
        @param recursive: if orig_uri represents a directory, should I copy it
                          recursively to dest_uri?
        @type recursive:  bool
        @rtype:           L{twisted.internet.defer.Deferred}
        """
        # What we do here is convert the uri in its http form,
        # and ask the media_manager to provide a suitable component
        # - such as GnomeVFSProvider - to do the work for us
        uri = self.get_real_uri(uri)
        media_manager = common.application.media_manager
        if media_manager.enabled:
            media = media_manager.delete(uri, recursive)
        else:
            media = None
        return media


We rely here on the HTTP MediaProvider (GnomeVFSProvider for example) to do the 
work for us, by converting our URI to a http scheme. 

Basically this is all you need to write a complete and fully working 
MediaProvider. To test it, you will need to either add the Stage6 component to
an existing plugin or create your own. Here is a example of a Stage6 plugin,
you can save this file as stage6_plugin.conf and put it in the same directory
as your stage_media.py file:

::

    [general]
    name = 'stage_plugin'
    version = '0.1'
    plugin_dependencies = []
    external_dependencies = ['BeautifulSoup']
    description = 'Stage6 video provider component'

    [stage_media:StageMedia]
    description = 'MediaProvider for stage6:// scheme'


You will also want to add a Stage6 path to the locations property of an 
Activity, such as VideoActivity.

To do so, open your Elisa config file, and add 'stage6:///' to 'locations' of
your video activitiy as shown below:


::

 [base:video_activity]
 locations = ['stage6://', 'stage6:///music', 'file://./sample_data/movies/']
