Architecture and integration

Дата канвертавання22.04.2016
Памер204.16 Kb.

Architecture and integration

Internal Note

Architecture and integration


This document addresses architecture and integration issues related to the building of EASAIER prototype.

Version 1.5 Draft

Date: August 24th 2006

Editor: Silogic, DIT

Contributors: Luc Barthélémy, Laure Bajard, Dan Barry, Chris Landone, Chris Cannam

Table of Contents


1. Introduction 3

2. Global Requirements 3

2.1. Archives 3

Sound archives 3

Other archives 4

2.2. EASAIER users 5

2.3. Access mode / level of service 6

3. Applications Requirements 9

4. Server side 9

4.1. Server side application definition 9

4.2. Greenstone framework 10

4.3. Fedora 11

4.4. Others 11

Conclusion 14

4.5. Indexation modules 14

4.6. Retrieval-matching modules 15

4.7. Editing modules 15

5. Client side 16

5.1. Client side application 16

5.2. Mockups 17

5.3. Modules integration 17

5.4. Library and API 19

Fmod 20

Steinberg VST SDK 20

IPP 21

Vamp plug-in 21

DirectX 21




M2K 23


Marsyas 24

Java sound API 24


Maaate 25


This document presents some issues we have to consider building EASAIER project.

This document will not solve them all. Input from involved partners is needed to build robust and efficient solution for the prototype.
Choices about the services EASAIER provides will also be exposed.
First we will enumerate some requirements we have and then try deduce some possibilities or constraints they make on the architecture.

2.Global Requirements


Sound archives

EASAIER sound archive will be hosted on a server allowing multiple clients access. This archive contains audio resources linked to multi-media content.

  1. : Platform for the EASAIER archive.

  • We can specify which platforms we target for hosting the archives,

  • Compatibility should be in mind but we cannot test and validate during the EASAIER project EASAIER system on several platforms.

  1. : Remote access to archive materials for indexation or browsing implies

  • Management of security,

  • Access right (DRM),

We are working with the idea that a copy of the audio file is never transferred to the client computer in a generic audio format. Only the EASAIER user interface (client side) will have access to the audio file from RAM. No local copies will ever be stored to disk. The content is only accessible to the user during an online session. The archives themselves need to confirm what exactly the issues with DRM will be.

 We could store files to disk if only the EASAIER client is able to decrypt them, but this has to be verified with the content provider.
 Can we guarantee that all audio files are small enough to fit in memory?

How does this policy work for video delivery, which will presumably need to be streamed with a local disk cache?

I don't think we can operate on the principle that it's impossible for the client to obtain a copy. A sophisticated user will almost certainly be able to extract both audio and video from the client whatever we do. We can mitigate this by (1) streaming and buffering rather than explicitly saving; (2) excerpting from copyrighted materials rather than delivering them complete; (3) delivering lower-quality encodings than the originals; (4) encryption, although I think an effective DRM policy is not in scope for this project. In every case it will still have to be up to the archive manager to ensure the policy matches what they're permitted to do; we probably can't guarantee anything.

  1. : For prototype / integration

  • Use of remote archives?

  • Duplicate some archives and deploy them on the prototype network?

 We will most likely duplicate some portion of an archive and deploy on a prototype network for testing.

  1. : Multiple EASAIER archives

  • We may open the system to allow a client to submit query to multiples archives simultaneously assuming they are all EASAIER.

 The system we develop will not necessarily allow multiple separate archives to link. ***BUT***, it is feasible that multiple audio archives which use the EASAIER access system could have cross access to each other. For example, a user could submit a single query to multiple archives simultaneously assuming they all use EASAIER. Our focus however will be on the model where there is a single archive with multiple remote users. The fact that cross archive access is possible is added value.

Other archives

EASAIER will also enable access to other materials and media than purely sounds archives. Cross-media retrieval being one of the lacking features that current digital libraries have.
 EASAIER will only facilitate audio archives, but will support multi media content within audio archives which often have non audio content also. Search facilities will focus on audio but non-audio content manually linked (by the archive administrator) can also be returned with search results.

  1. : Multimedia storage

  • Are cross-media elements stored on the same archive that audio resources or just linked to remote documents?

 The non-audio media content will be stored in the same archive

 The focus is specifically on audio, multi-media content support is added value. The multi media content is assumed to reside in the same archive as the audio. Examples of the sort of non-audio media these audio archives have are: video of musicians etc…, images of posters and concert tickets etc…, text in the form of stories or liner notes about the recordings etc.

  1. : Other media

  • What kind of extra media do we cover? text, video, images,

 We should limit the prototype to the following media: Audio, Video, Image and Text. The majority of the automatic indexing will be for the audio only. Most other media types will likely be linked manually by the archive administrator.
 What about chord?

  1. : Other archives

  • What kind of extra archives do we cover? text, video, chords, freeDB, ,

 We do not support non-audio archives. Each instance of use of EASAIER is between a single archive and its clients. A single audio archive is likely to have other content such as video and image. This is the cross media aspect of the project.

**Again, if multiple archives use EASAIER, these archives could achieve cross access since the interface is likely to be the same in all cases.

  1. : For prototype / integration

  • Use of remote archives?

  • Duplicate some archives and deploy them on the prototype network?

 We will most likely duplicate some portion of an archive and deploy on a prototype network for testing.

2.2.EASAIER users

EASAIER will propose rich access to media-archives providing innovative retrieval and visualisation tools. For today we can target different kinds of users interested into browsing such archives:

  • Content manager

  • Expert in music like music student: need strong visualisation and annotation tools.

  • General end-user.

 This is a typical case where input from the expert users group would have been invaluable in order to pin down the functionality of the system before launching ourselves into architectural design mayhem. it seems that it will be some time (3-6 months) before we get anything from them so we’ll have to figure some basic functionalities out .

I feel, however, that assuming two different types of users at this stage is rather premature and might lead to an unnecessary complication in the system’s initial design. For the moment it would probably be a good idea to concentrate on a client that provides all of the visualisation and enriched access features, and then figure out who gets to see what at a later stage.

I’d like to remind everyone that the technical annex only gives a basic overview of what the system is supposed to do but we still have to formally define a set of functional requirements !

2.3.Access mode / level of service

Different access modes have been expressed in the Technical annex, in the express of interest of expert users or can be deduced from the definition of users.

 From the technical annex, the discrimination between expert and general user is not explicit, I don’t agree with this distinction as the archives we’re targeting are probably used more by the “experts” than by somebody simply coming across a web site.

  1. Web based access

  • A web-based access is mandatory to address general end-users. It allows fast access to archives, with simple navigation and retrieval tools.

  • => Implies a Web server

 The “hotbed” system is already in place maybe we could explore ways to interface it to the easaier system rather than start from scratch.

  1. Enriched navigation

  • If we address enriched navigation, with several visualisations, synchronisation between different medias, a more complex client is needed.

  • This could be a downloadable executable?

 Yes, absolutely! The processing required to provide the enriched access tools should be carried out on the client side. The client application should be a standalone application similar to a browser, but customised for EASAIER. The client side application will probably do all the audio processing and media synchronisation. The server will simply respond to queries and return content.

 Or we could keep the client side in the form of a web browser with java apps for the looping and other interfaces or embed a custom player with just the time scaling algorithms included .

I think the issue of deciding where the bulk of the signal processing should be carried out is becoming quite urgent and I can foresee this “getting ugly” in the long term..

In a nutshell, if we have to have a war, I’d rather have it now ;) so I’ll start by outlining some thoughts we’ve had at qmul in the past weeks. Please feel free to thrash them and/or add your own.
Approach 1: Enriched access is enabled by the client. This means that the server sends only the original audio stream and all of the signal processing is carried out by the user’s machine.

    • general reduction of processing and storage burden on the server side.

    • If the enriched access tools (e.g. denoising and source separation) require fine tuning, the end user will be able to tweak the parameters in order to get better results.


    • requires a customised application running on the user’s machine, potentially making cross-platform compatibility a major issue and/or requires maintaining code for different OS, CPUs, etc

    • depending on the machine’s speed the user might experience some frustrating delays.

    • file transfer could require a higher bandwidth as algorithms might need full stereo and quasi-transparent audio quality to be able to work correctly

Approach 2: Enriched access is enabled by previously cached audio files in the server. De-noised and source separated files can be generated and stored in the server during the archiving process.

    • client-side can be a relatively dumb machine where only basic visualisation and browsing are implemented, making it easier to implement an OS independent tool (web browser + java ).

    • assuming that denoising and source separation occur during the archiving procedure as pre-processing stages in order to facilitate features extraction (to be verified), then it would make sense to store the result locally.

    • the end user has instantaneous access to the enriched content.

    • may be a better fit with copyright management policy (e.g. delivering lower quality or partial material).


    • an increase in storage requirements per archived asset. This can be made relatively small if these “offspring” are stored in a compressed format.

    • time scaling tools would still be needed on the client side, although the possibility to implement them on the server side should be explored.

    • an increase in indexation complexity.

Decision: (regarding the approaches)

  • What OS do we target for a local client (Windows, Linux, Mac)?

 For the prototype we suggest windows.

 As far as the prototype is concerned I have no qualms, however I don’t feel particularly at ease with forcing people/institution to subscribe to a specific OS, especially when portability is perfectly feasible – if we decide to go for a custom application rather than a web browser approach there are perfectly good solutions for a cross-platform implementation (e.g. QT & WxWidgets ).

Also, as a “minor” political consideration, we all know the EU commission’s feelings for microsoft’s monopoly ;)

 I fairly strongly feel we should use a cross-platform toolkit (e.g. Qt, or language, e.g. Java) if building standalone applications for the client side. There are plenty of options. I also feel we should aim to use the same for the prototype as for the final product (or perhaps I should say, I expect that the final product will end up being produced with the same technology as the prototype). See also notes to 5.2 below.

  1. Export / Annotation. For experts, the tools become even more complex.

  • Here again a local client seems mandatory (playing several media..).

  • We have also in this case some performances to address for annotation, computing indexes…

 Can you clarify this please, I’m not sure what you mean?

 I understand that generally annotation is done server side. But for the experts, we give them some tools to re-compute or change annotations. I see two possibilities here:

    • Annotations are still done server side but driven from the expert application,

    • Or annotations are done locally and the results are sent to the server. In this case, the tools we build for annotation are not only deployed on the server side but also on the expert posts.

 Good Idea!

  • Annotation is made server side and driven from expert computer OR is made on the expert computer and post to the server. This is also related to performance issues.

 The annotation/indexing should happen server side at archive setup time. We intend to create tools which will automatically annotate many features of the audio content but there will also be a requirement for manual annotation for certain fields such as artist name, track name and year etc…..these fields will most likely be filled by an expert at the archive (server side).

 What is exactly the coverage of the expert annotations? Does he/she only enter some specific fields (artist name, track…) or does she also have access to complex annotation made server side?
 I think the experts/administrators should have the ability to edit all fields manually in the case that the automatic tools fail to give a correct answer
 Annotation by users, whether expert or general, is beyond the scope of the system. We shouldn’t try to implement a collaborative annotation system like BOCA. I fear the worst if we also have to manage users’ contributions, just imagine implementing the review process …. Noooooo!

EASAIER is complex enough as it is …

We do have to provide the “archiving client”, though, something that allows individuals with administrator rights to remotely enter and amend data and content into the server. This tool can be a windows app, I have no objections ;)
 We need to arrive at a conclusive list of meta-descriptors which we will use to tag each audio object. This will be primarily the job of QMUL and DIT.
 Agreed, that is the ontology result.
 We also need to decide weather the metadata will be embedded in each media file or centralised in an XML document for example. I favour the idea of the XML document.
 Agreed, see Note 16..

 In this way the actual metadata could be stored client side and updated each time the user logs on to the server. This way the query and retrieval could also be done client side making the process even faster. Browsing could even be done offline. Content retrieval of course would require the user to be online. The actual content is only retrieved when the user makes a selection from the list of search results. The popular (and soon to be illegal) music download site, uses a method very similar to this with great success.

3.Applications Requirements

Once we have decided the global architecture, we can move to the details.

We can divide EASAIER tasks in functional areas. These areas often correspond to functional modules that cooperate resolving complex task. These modules exchange information, commands…
As stated in the Technical-annex, we plan to release open source software. We will not redevelop in EASAIER some already well-done components but will focus on bringing a step forward the community tools.
Therefore, EASAIER will be built on already open source components. Some existing solutions are presented in the next sections.
 Windows is proposed as a target platform for the EASAIER project both for server and client side. Anyway, when possible we should keep in mind portability.
 As far as the server side is concerned I would favour any platform Silogic feels comfortable with.
 Agreed

4.Server side

4.1.Server side application definition

Server side application – This application should facilitate both the automatic and manual indexing, annotation and linking of the content within the archive. It should also provide basic archive management tools for the archive administrator to update the archives online contents. The application should facilitate the initial setup of the archive for online access as well as the archive management tools once the archive is live. Essentially the owner/administrator of an archive uses this application to “curate” the archive.
This application contains the following features:

  • Database management to store information: audio resources, metadata and extra materials (multimedia).

  1. Do we need to store some audio data related to the original source, like enhanced record, audio separation files.

 This is a good question. Answer: No. My personal feeling is that we ONLY store the “raw audio data” in unprocessed form. Any server side process is ultimately for the purposes of generating metadata. In the case of audio separation files, the end user (client) can perform audio separation client side at access time if he/she wishes. *Note – Audio separation is both a tool for metadata generation (server side) and an enriched access tool (client side) The very same tool is used for two different purposes.

  1. If yes, the size of the DB can very important.

  • Indexation: tool for automatic or manual indexation,

  • Querying: to extract information with some specific matching metrics.

  • Web services: for a web base access, the application should provide simple navigation and retrieval tools.

  • Administration: to set-up and maintain archives,

  • Remote access: the content should be accessible and editable from a client application (on its local computer). Thus, a layer should export application’s features for a remote access.

  1. Remote access could also include administration.

  • Portability: this is not mandatory, but an added value.

4.2.Greenstone framework

One has proposed to use greenstone ( as a storage layer.

Greenstone is an open source project dedicated to digital library collections. It offers some interesting features to create, index and organize digital collections.
Particularly in our case:

  • It uses plugins (perl) for metadata indexation. It means we can develop our own plugins and have them called by Greenstone framework when a new document is added,

  • Metadata are stored in an XML format, allowing for example to store them client side.

  • It offers basic web services for online browsing.

After a look at Greenstone 3, it appears to cover more functions than only the storage layer:

  • It offers the same possibilities as Greenstone 2 (storage, indexation…),

  • It adds the possibility to develop our own front-end (client application) and communicates with Greenstone using SOAP and XML.

  • It can be distributed on several servers.

  • It is multiplatform (java based)

A risk is that this version is still in development and is today unstable (July 2006). A first version of Greenstone3 was released in November 2005.

Resources put on the project are small:

  • One developer referenced on sourceforge,

  • Mailing list shows very little activity.


Fedora ( not red hat) is an acronym for Flexible Extensible Digital Object Repository Architecture. Fedora’s flexibility makes it capable of serving as a digital repository for a variety of use cases. Among these are digital asset management, institutional repositories, digital archives, content management systems, scholarly publishing enterprises, and digital libraries. Fedora is open-source software licensed under the Mozilla Public License.

It offers some interesting features:

  • Service-oriented architecture

  • Repository access and management as web service

  • Flexible Digital Object Model

  • Digital Object Identifiers and URIs

  • Digital Object Relationships

  • Datastream Versioning

  • XML-based Ingest and Export

    • FOXML Ingest Example

    • METS Ingest Example

  • XML-based Digital Object Storage

  • Basic Search

  • RDF-based Resource Index with Search

  • Server Command Line Utilities

  • Object Disseminators

    • Binding to Fedora Local Services

    • Binding to Remote Web Services

  • OAI Provider Interface

Fedora was first released in 2002 and is funded until end of 2007. The activities on the mailing list and on the development side seem promising.


Many other solutions exist. One can refer to the analysis by the Sheridan Libraries at Johns Hopkins University on repositories and services. This analysis tested the following solutions:

  • DSpace

  • Fedora

  • DigitalCommons

  • JSR 170


  • ECL

The table below, gives some input on the main solutions : Dspace, Greenstone and Fedora.




Designed for

Allowing non-specialist users to produce single, individualized, collections from existing resources.

Institutional setting, where members of faculty submit their documents to a common system that enforces common standards

gives organizations a flexible service-oriented architecture for managing and delivering their digital content.

Data preservation

Not supported

Long term preservation; formats continuity

Content versioning

Support infrastructure




All Windows, Mac OS/X, Unix, Linux

Windows XP, OS/X, Unix, Linux,


End users

Librarian-oriented: permit designed and built collections

Author-oriented: allow document submission to the system

allow document submission to the system


Existing metadata standards + built own metadata scheme

Single metadata standard

Existing metadata standards + built own metadata scheme

Distribution on removable media


Not possible

Dynamic collections

Integrated in V3



Authentication/ security/ submission control user group







Metadata Norms


OAI, Dublin Core, METS (export only)

OAI, Dublin Core, METS, FOXML

Web Services





35 interfaces languages available


GNU General Public License

BSD license (non heritage)

Educational Community License 1.0

Plugin integration:

a. Indexation

b. Search

V2: a. Ok b.?

V3: a.Ok b.?

  1. Import function?

  2. ~No

  1. Ok

  2. Ok

Remote exe (client side)





91.68% (sourceforge)

last version: 11/2005

99.21% (sourceforge)

last version: 07/2006

1st version: 05/2005


Regarding the above solutions, it seems to Silogic that Fedora is a good choice for the EASAIER project. It offers:

  • a good repository (tested with millions of objects),

  • XML based storage,

  • Basic web services for default collection browsing,

  • Disseminators to customize how objects streams can be accessed,

  • Compatibility with METS, OAI interface

  • SOAP or REST API for client access to browse or management,

  • Security layer.


4.5.Indexation modules

Once we associate a new resource to the system (audio track), it computes metadata for indexation. The number of modules computing these data can vary. Also, some of them could be linked together and may depend on the output of a previous one.

 Yes
We have to define an integration layer of these modules to:

  • Add new indexation modules,

  • Associate the storage elements for the metadata,

  • Set-up the modules parameters,

  • Describe the dependencies between them (workflow?)

This information could be stored in an XML based format.

4.6.Retrieval-matching modules

Once the metadata are created in the storage, we have to use them to have an accurate retrieval system. We need to compute distances, or validations between these metadata and other metadata or values. Some comparison may be trivial like text, or may need more complex matching criteria.

Thus, we need to associate for each metadata, which tool is used to compare it.
A “logical” layer above should combine these metadata with logical operators for querying.

4.7.Editing modules

Do we need some editing modules for audio experts? What data is changed or edited? Metadata? –

 In general experts should change only metadata. The archive should provide an approved “edit” of the audio file at setup time. There are plenty of “off the shelf” tools which would allow an expert to edit the audio before it is placed in the online archive

5.Client side

5.1.Client side application

Client side application – This application will provide the query and retrieval interface that the user will see.

  1. A simple access with a web browser is also given using the web services.

This application should facilitate easy browsing and searching for content. Once the desired content has been retrieved, the user should then be able to process, interact with and visualise the content in useful ways. These processes will be quite processor intensive and so should take place on the client side.

 Agreed.
The application should be some sort of standalone .exe which can access internet ports for retrieval of data remotely from some archive server.
 Yes, see Remote access layer on the server side.
 Once the data is with the client, local resources can be used to interact with and process the content. No physical copy of the data is ever made. The data dies when the session ends. The client side .exe will most likely be downloaded from the particular archive’s website

The client side application will provide enhanced tools for browsing the archive, this include:

  • Intelligent interface to improve user archive browsing experience,

  • Enriched content access trough multi-media,

  1. XML based indexation format could allow to browse the library with some pre-cached data: when the user browses the archive, its application cache is updated with server information.

  1. Do we plan to compute on the local client some complex operations like audio separation etc.. This is important as it means for example that some indexation components are not only deployed server side but should be plugable in the client application.

 There is a possible scenario where the user (client) “owns” a song which resides client side. They wish to “search for similar sounding songs”. For this, the automatic metadata generation tools would also need to exist client side. So in answer to your question, Yes…


 Shortly, I will also provide some possible ideas for general program flow. I’ll try and provide some user scenarios and the sequence of screens which should lead a user to the content they require. I will provide as many GUI mock ups as possible.

 That’s perfect.

5.3.Modules integration

 Some developers in QMUL use Linux/Java and some at DIT use C++ and or Java. NICE are specifically C++. Can we integrate all of these? In general we suggest that each partner involved in audio processing provides code in the form of DLLs. For this we will need to specify the I/O for each DLL in order to ensure compatibility with the host application provided by Silogic

 QMUL C4DM uses mostly OS/X as a development platform, with some Linux. Language is generally C++ (and of course e.g. Matlab). Java is not very widely used.

Perhaps worth noting that at the moment we have zero Windows development desktops (one laptop). This is one reason we would favour a cross-platform environment for development and testing, even if the end result is deployed on Windows.

 I totally agree with you.

We have to define precisely:

  • The platform (today let’s say Windows),

  • The API or communication layer between our modules,

 Very soon I will provide you with a very detailed document concerning the audio tools and elements which will be provided by DIT. These are outlined in WP4 and WP5. At this early stage we are in favour of the idea that each of these audio processing modules be developed and compiled as a dll. As such we will need to specify the I/O structure for each dll. We will give reasonably detailed GUI mock-ups which encapsulate the layout and functionality of each module but ultimately the design and “look and feel” will be the up to Silogic.

 The 3 main inputs to our DLLs will most likely be:

  • A pointer to the audio which has been retrieved and is currently client side.

  • An object or class containing the metadata about the audio. This metadata will also include links to related media.

  • An object or class which contains the current parameters which have been set by the user on the module GUI. (E.g. the current position of 3 sliders and a radio button)

 The main output to our DLLs will most likely be:

  • A pointer the processed audio for playback on the client’s machine.

  • Pointers to objects or arrays containing information for graphical user feedback.

(e.g. data which can be plotted to give the user information for parameter selection).

This is an early assessment and overview of the possible route. We will clarify and detail this as time goes on.

 A note about DLLs: we are currently using a dedicated API for the feature extraction modules, have a look at

These “Vamp” plugins output complex multidimensional data with labels. The API is well specified and documented, although not completely general.

Dan, you mentioned VST and DirectX plugins, I assume you had the enriched access tools in mind as these are “data” transducers primarily used for audio processing. There are some issues regarding the adoption of these established APIs – such as not being open source and, especially the DirectX, not being portable to anything else than windows.

I’d suggest we use a proprietary API or, even better, go for LADSPA, there are plenty of apps in the open source space that use this API and loads of hosts that support it.

 Note that you cannot legally use the Steinberg VST SDK in GPL software. (At best the legal situation is ambiguous, with the ambiguity being over whether the Steinberg headers are copyrightable at all rather than over interpretation of their licensing terms, which are quite clearly intended to be incompatible with licenses like the GPL.) If the GPL is our intended license (is it?) then some other API must be used.
Of course if someone would like to reverse engineer the VST SDK so as to make it possible to write GPL'd VST hosts and plugins, then great! I can't do that myself as I've already agreed to its licensing terms.
LADSPA has the advantage of being basically a minimal subset of the other real-time audio processing plugin APIs, so it's very easy to port code to and from it. Despite the L in its name there is nothing Linux specific about it.
Regardless of API, where possible audio processing modules should be structured to work on a frame-by-frame (real-time) basis for maximum versatility.

Decision: (regarding the API)

5.4.Library and API

Here are some libraries we could use for EASAIER.

 Please could you give your feedback / opinion on those tools, so we can select them at Dublin ?


FMOD is a cross platform audio library and toolset to let you easily implement the latest audio technologies into your title.

  • Sound designer focus and tool. The new suite of tools and functionality means FMOD is usable by sound designers and musicians and not just programmers. Sound authors will have the ability to create complex audio models and tweak them in real-time over the network (or even internet) while the game/application is still running!

  • Full 3D sound support including linear/nonlinear/custom rolloff models, multiple listener support, occlusion and obstruction (using real polygons!), sound cones, and support for stereo or multichannel samples being played in 3d!.

  • Virtual voices to allow a game to play thousands of sounds at once on limited hardware without worrying about handling the logic to switch sounds off and on themselves.

  • Support for over 20 file formats.

  • Supports 13 platforms

Licence: free for non-commercial uses – else: $6000/platform + $3000/extra platform

Steinberg VST SDK

The VST Module Architecture described is an object-oriented, cross-platform and compiler-independent interface model. It specifies what components must "look like" and how they are created by the host application.

Source code: C++

Inconvenient : incompatible with GPL licenses


Intel® Integrated Performance Primitives (Intel® IPP) is a library of thousands of multi-core-ready, highly optimized software functions for multimedia and data processing applications.

Optimised for an Intel processor  interest for us?

Source code: C/C++

Price: $199

Free version: IPP Plsuite (old version of IPP); free download for Linux until August 2006 (non commercial purpose).

Vamp plug-in

Vamp is an audio processing plugin system, intended for plugins that extract descriptive information from audio data. Vamp is the primary analysis plugin format used by Sonic Visualiser. Multiplatform

Source code: C/C++

Licence: permissive BSD-style



Inconvenient: Not portable


Linux Audio Developer's Simple Plugin API

Many audio synthesis and recording packages are in use or in development on Linux. These work in many different ways. LADSPA provides a standard way for `plugin' audio processors to be used with a wide range of these packages.
Source code: C/C++

Licence: LGPL

Advantages: “LADSPA has the advantage of being basically a minimal subset of the other real-time audio processing plugin APIs, so it's very easy to port code to and from it. Despite the L in its name there is nothing Linux specific about it.” (Multiplatform)


The CoMIRVA project aims at building a framework for Java-implementations of various algorithms concerning music, multimedia, information retrieval, information visualization, and data mining. At the moment, only a preliminary version of CoMIRVA is available. We want to include more algorithms for extracting features from audio data, from the Internet, or from other sources. Furthermore, it is planned to provide various functions for processing these data. The current implementation of CoMIRVA mainly focuses on data handling and visualization.

Source code: Java

Licence: GNU GPL


A low level feature extraction plug-in api.

Source code: C/C++

Licence: BSD-style

Activity: 94.82 %, registered February 2005 - last version June 2005


M2K (Music-to-Knowledge): A tool set for MIR/MDL development and evaluation

Source code: Java

Licence : specific licence for commercial use


MARF stands for Modular Audio Recognition Framework. It contains a collection of algorithms for Sound, Speech, and Natural Language Processing arranged into an uniform framework to facilitate addition of new algorithms for preprocessing, feature extraction, classification, parsing, etc. MARF is also a research platform for various performance metrics of the implemented algorithms.

Source code: Java

Licence: BSD-style

Activity: 99.57 %, registered September 2002 - last version July 2006


Marsyas (Music Analysis, Retrieval and Synthesis for Audio Signals) is an open source software framework for audio processing with specific emphasis on Music Information Retrieval applications.

Source code: Java

Licence: GNU GPL

Activity: 98.80 %, registered July 2003 - last version July 2006

Java sound API

The Java Sound API specification provides low-level support for audio operations such as audio playback and capture (recording), mixing, MIDI sequencing, and MIDI synthesis in an extensible, flexible framework.

Very documented


C++ Library for Audio and Music

CLAM is a full-fledged software framework for research and application development in the Audio and Music Domain. It offers a conceptual model as well as tools for the analysis, synthesis and transformation of audio signals. Platform independant
Similar framework: Marsyas, maaate

Source code: C++

Licence: GNU GPL (FFTW for non-free usage)


Maaate: The Australian audio analysis toolkit

Maaate (pronounce: ma:a:it) is a set of libraries that enable audio signal analysis and feature calculation in the compressed/frequency domain. Its design allows to support any kind of sound file, compressed or not. The current release handles only MPEG-1 compressed audio files - other formats will be plugged in. Maaate contains a wide set of analysis modules such as energy features or spectral features. Multiplatform
Source code: C++

Licence: GNU GPL


Source code







free for non-commercial uses – else: $6000/platform + $3000/extra platform

Steinberg VST SDK

incompatible with GPL licenses

Vamp plug-in

permissive BSD-style









94.82 %, registered February 2005 - last version June 2005


specific licence for commercial use



99.57 %, registered September 2002 - last version July 2006



98.80 %, registered July 2003 - last version July 2006

Java sound API


GNU GPL (FFTW for non-free usage)



Decision: (regarding library API)

Sound Archive

EASAIER metadata DB

General user

(client & web)

Expert user (client & web)





Browse / Query



Archive Administrator


(Local or Remote)

On Location

Clients / Users


The EASAIER UI should cater for general and advanced users but essentially the same client software will be deployed in both cases
eneral EASAIER Architecture

EASAIER Internal Note

База данных защищена авторским правом © 2016
звярнуцца да адміністрацыі

    Галоўная старонка