Feeds:
Posts
Comments

Posts Tagged ‘lak11’

Traditional BI has permeated the education function, at least in terms of the available platforms. Nearly every LMS platform provides some kind of reporting, but some systems are really advanced. For example, SABA’s BI (I think they use Business Objects) & SABA Social or Mzinga Omnisocial Platform analytics or Valdis Krebs’ work at orgnet or the work being done at Radian6, don’t leave much to imagination.

I have also come across Knowledge Advisors and their Metrics That Matter line of products for talent & performance and learning. They offer an on-demand SaaS Human Capital Analytics System with rich dashboards and an impressive client list. They have integrations with LMSs, conference attendance records (phone/web), surveys and interfaces for data capture from offline events. They build the big picture RoI and help organizations make predictions and take data driven decisions about learning and performance.

The discussion on the LAK11 Moodle forum and presentations by Ryan Baker and Kim Arnold on their work show a differentiated experience in the academic sector. The tools do exist, the implementation is not that uniform.

These are fairly large datasets, atleast for a medium to large organization or university. I disagree with scores or tool access patterns or attendance being used to make predictions. Scores are not representative of competence unless tests are really reliable and have predictive power (like in high stakes testing).

Likewise, building judgments on comparative (between high and low performers) analytics of usage of (behaviors on) a learning tool or system is like reverse engineering the design of a learning program – looking at it upside down.

Attendance or time spent is not a standalone indicator, if at all, of learning performance. It is difficult to capture quantitatively in online interactions where, for example, a single tweet could reflect a microsecond or weeks of learning effort.

On a related note, we are not only looking at snapshot data, but also temporally changing data. And in looking at this temporal change, one forgets that the sources of this data have changed – students have moved on, infrastructure has changed, teachers have been reallocated and so on. When the base has shifted so much between two points in time, how can the corresponding statistical results be compared to one another?

Similarly, Freakonomics inspired data scientists would have a field day generating correlations among seemingly unconnected data, basing corrective actions on those predictions and then believing that their actions yielded results. In the process, they would, by definition have ignored several other seemingly unconnected data, undermining the very starting point of the analytics itself.

Where the Semantic Web has made its start at analytics is by evolving SPARQL – a way to query graph data / linked data. SPARQL in its current version does not include inserts or updates, but allows many of the typical query language affordances like joins. But the interesting thing is, that like relational databases returning (single or multiple) data views, the query operations on a graph can return a graph itself. Which means that theoretically at least, assuming that the entire web was modeled and linked, that we could start from any state and keep exploring in an infinite manner. Why? Because everything would be connected to everything else (and maybe in less than 6 degrees).

In this context, I found Dragan Gašević ‘s presentation interesting from the object-relational semantic web perspective, specifically around LOCO (Learning Object Context Ontology). In order to build context, Dragan identifies a mix of various ontologies that are needed to describe the context – domain and user ontologies being the most obvious. Using those ontologies, he presents a way to combine the social web with the semantic web. He comes across the same old challenges, though, in trawling the web for unstructured data and ends with the familiar buzzwords – personalized, interactive, social, collaborative and ubiquitous.

Read Full Post »

LAK11: The Connective Semantic Web?

In the whole discussion about semantic modeling, linked data is supposed to be a type of Graph data as opposed to object, hierarchical or relational data. Two sets of data can be related if they share intersecting vocabularies and ontologies.

Ontologies can be expressed using OWL. OWL extends RDF relationship vocabulary and allows you to define how classes and properties can also be related:

OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. “exactly one”), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.

So OWL allows us to model a domain. We may choose to model it hierarchically or relationally or in terms of a graph using a class-property structure. It is an object-relational model because it allows you to create ontologies that are distinct from their RDBMS level implementations.

Where it is deficient is that we still do not have any way of coding class behaviours. Remember when XML-RPC came in? It offered a way to programmatically extend XML to call methods over HTTP. Similarly, in object relational systems, class behaviours are scriptable allowing runtime changes to happen.

Since Linked Data is not totally object-relational at this point, it cannot benefit from scripts that can adjust properties and relationships on the basis of changes happening in related linked data.

Another aspect is that in the real world, vocabularies are dynamic. Ontologies change over time as well as specializations and inter-disciplinary areas emerge. Therefore the relationships also change and new ones are continually created.

Obsolescence also plays a big role in trimming vocabularies, so does their archival process. There is a management role and overhead complicated by constant new additions because of crawling the unstructured web. The scale itself renders this an impossible endeavour and therefore, we will end up creating vocabularies and ontologies not unlike the RDBMSs of today.

The next aspect is linguistic and cultural differences. The web is diverse and multi-cultural. Building semantic equivalence across these categories is going to be a complex ontological task in itself.

But these would matter if the Linked Data vision was to model the world. I don’t think it is or can be. The vision is to help build a more connected web. It is also to help build a more coherent web.

There is a great challenge in building coherence – take a look at the uneasy implicit alliance between Demand Media and Google – Ben Elowitz writes:

Demand Media probably pollutes Google’s results more blatantly and thoroughly than the top black-hat spammers of the Web.

The closest I have come to building a linked data web with behaviours and without using RDF and OWL, is when we built a simulation framework to model real life sales situations. We called RDF triples (loosely) Facts of Life (FoLs). Each FoL was related to other FoLs and could be described by a value or an enumeration or a range or an expression. Changes in one FoL could impact other FoLs using a set of business rules and computational expressions. FoLs could nest within each other. XML was the base platform for writing the FoLs and their relationships. We built probabilities across each FoL to reflect experienced customer profiles, but that is another story.

What Connectivism and Connectionism tell us is that there are mechanisms that operate in networks. Connections are not static. The graph keeps on changing over time and at varying rates. Not only that, complex and chaotic systems move in complex and chaotic ways.

Without an accommodation of this fact, the Semantic Web would be able to model entities (definitions) and relationships, but not behaviours.

On another related note, I still remember working on JINI http://www.jini.org/wiki/Main_Page when it was first brought out in competition with UDDI.

Jini technology is a service oriented architecture that defines a programming model which both exploits and extends Java technology to enable the construction of secure, distributed systems consisting of federations of well-behaved network services and clients. Jini technology can be used to build adaptive network systems that are scalable, evolvable and flexible as typically required in dynamic computing environments.

Jini is designed to help developers deal with The Eight Fallacies of Distributed Computing. Jini offers a number of powerful capabilities such as service discovery and mobile code.

A stronger model of a semantic web could possibly, in my opinion emerge, if we were to focus on building adaptive network systems that can help derive meaning using open and collaborative efforts. Allow ways for people to build skills and engage in sensemaking.

Rather than trying to model the world, let the world model itself.

Read Full Post »

Before social media happened to us, there were vocabularies such as the ones encountered in the Semantic Web. You could see these everywhere – from programming languages to news to business. Specifically, vocabularies such as those embedded in NewsML-G2, EDIFACT and ANSI X12 echoed a need to standardize data exchange across a wide range of industries, giving an integrated technology platform for transacting business documents across the supply chain.

SOAP and REST based Web services came along to offer an easier XML based way to communicate the same standardized information (and then the Windows Communication Foundation). Various systems emerged to manage XML including Object Oriented DBMSs (like Cloudscape) and Tamino, one of the first relational databases based on XML. All tried to marry the relational database architecture to the XML platform. And meanwhile, DTDs evolved to XSDs as the schema definition component. On the BI side, Essbase Hyperion and Cognos arrived with compelling BI facilities, accompanied by SQL Server OLAP and the Oracle Datawarehouse.

In education, AICC and SCORM (and IMS QTI – Question Test Interoperability) standards set the vocabulary to be adopted for communication between a course (and assessment) and the Learning Management System. The DITA (Darwin Information Typing Architecture) emerged as an XML based standard for authoring, managing and publishing content. The S1000D standard emerged to describe equipment. Interfaces have emerged between SCORM on the one end and DITA and  S1000D on the other.

But Social Media happened to us and suddenly everything went open and “un-designed”. The term Folksonomy came into vogue to describe common, non-centralized vocabularies, which is what the most effort by search engines like Google was based. Even now, tags and their analysis form a central part of social network analysis.

As a result, business and education technology has had to play catch up with SoMe. Perhaps, we are waiting for maturity in the Education space, but business has shown good signs of adopting and using social media already – once they got the transactional aspect and the commercial angle worked out. However, SoMe fitted well into information spaces, customer relationship, sales and marketing, not, in any widespread way, the core aspects of transaction and operations.

The answer was the Semantic Web which attempts to bring together the SoMe quotient and the EDI quotient into some kind of alliance, and extending far beyond that as well to encompass search, business intelligence and BIG data.

Lots of views exist on why the Semantic Web would fail. Stephen talked from the SoMe side favouring personal computing and felt intuitively that the Semantic Web would fail because “it depends on businesses working together, on them cooperating”. There was also apprehension elsewhere that this will not happen and also the opinion that the vast majority of data on the web is unstructured (and will remain so).

In fact, even before Stephen voiced his intuition, Clay Shirky in 2003 stated it will fail:

“However, like many visions that project future benefits but ignore present costs, it requires too much coordination and too much energy to effect in the real world, where deductive logic is less effective and shared worldview is harder to create than we often want to admit.”

Cory Doctorow points to other problems such as schemas not being neutral, the general user’s competence & will, inherent bias and also the fact that metrics will influence choice of metadata.

So Tim Berners Lee expounded a new less rigid term called Linked Data. Tyler Bell, writing on Where the semantic web stumbled, linked data will succeed, states:

Successful adoption will often entail sacrificing standardization and semantic purity for pragmatic ease-of-use; this is where the semantic web appears to have stumbled, and where linked data will most likely succeed.

So is there a difference between Linked Data and the Semantic Web? That seems to be an ongoing debate (Nick Gall, LornaDesign Issues and ReadWriteWeb) .

Visiting the tutorial from the LinkedData website and Linked Data – the story so far, the evidence states it is not different in thought or implementation as compared to the Semantic Web and shares the implementation (through RDF, OWL and SPARQL).

So what will happen to non-common user generated folksonomies? With Linked Data, there cannot ideologically be folksonomies. SKOS seems to be an initiative that provides a migration path.

SKOS—Simple Knowledge Organization System—provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary. As an application of the Resource Description Framework (RDF), SKOS allows concepts to be composed and published on the World Wide Web, linked with data on the Web and integrated into other concept schemes.

Read Full Post »

LAK11: From Tin Can and LETSI

I mentioned Project Tin Can before on this blog. To give you a snapshot of the top ideas in their forum (please contribute as much as you can), here are some of the user ideas on how learning experiences should be tracked:

  • Distributed content – include content across organizational boundaries
  • Transparency of SCORM runtime data
  • Multiple collaboration between learners and teams
  • Handle sequencing dynamically
  • Move SCORM outside the browser
  • Tracking learning not hosted in a LMS

I think these are ideas about the learning experience which will resonate with the LAK11 and PLENK communities and are important enough to be considered seriously.

In this post, I want to cover what LETSI (Learning, Education, Training Systems Interoperability) is doing.

LETSI has many working groups. There is one on defining the next generation Runtime Web Services (RTWS) layer for runtime communication between LMS and Content using state of the art web technology. The Orchestration working group is working on expanding the sequencing specification in SCORM.

While “sequencing” implies the ordering of activities over time, we anticipate other ways in which things need to be combined: components need to be combined to make adaptable activities, different “players” need to be combined to engage in a collaborative or competitive activity; and learning delivery services need to interoperate with external data services. We see sequencing as just one type of orchestration. 

The Content As A Service (CAAS) working group is exciting. They are talking, among other things, about separating learning activities from the LMS, available from separate content providers. This could be a photo gallery shared in Facebook, or a question shared in Quora or this blog post – any resource/activity that could simply launch, engage the learner and report back metrics.

Then there is the Learning Activity Description (LAD) working group. This, in my opinion, is a very important group because it seeks to define a framework for describing what a learning activity is, how it gets triggered and what it can result in. If defined in the network / SoMe context, this could cover practically every activity (or generation of learning artifacts) that a learner would engage in.  

So, for example, today a blog post, structurally, provides two pieces of information – about the content and context of the blog post, and, about the collaboration with the network that accessed/commented on the post.  If a content provider, for instance, provided a blog service in which every blog post could be rated by the viewer and which also exposed the rating data back to the content provider, and if a metric that is defined to judge competence states that rating has to be above a particular level for the learner to be deemed competent in the field that the content refers to, then the blog post could be termed an instance of a learning activity. On the other end, it could be a complex simulation activity with complex outcomes and data.

With simple to complex educational outcomes related tweaks to existing platforms, our sense-making artifacts could provide a third source of inputs to the analytics process posed by George (apart from profile and intelligent data) – that of data that provides direct competency related performance information arising out of learning activities. As the group suggests in the Activity ontology definition (Slide 24):

Traditional Instructional Object classifies performance data to the Expected Patterns=Outcomes. Its Outcomes: standard completion, failure, error type, …Intelligent Instructional Object performs assessment of Competencies and results in Outcome = Competence Profiles.

For knowledge analytics, this could be seen as a way to think about measuring the gap between actual and required competency levels based on a competency framework.

But a crucial requirement for LAD to work is shared vocabulary or ways to specify the same. Another crucial requirement is shared data areas (shared memory). And this is the work of the final working group, that on Namesets and Common Memory.

I like what LETSI and Tin Can are doing from multiple perspectives. Firstly, as a long time SCORM sufferer, this open-ness and ability to move with the times is great! Secondly, there is an appreciation that the process of learning and teaching cannot be constrained to a set of data models and runtime APIs. Thirdly, there is a way now that non-LMS and non-Learning Object methodologies can potentially be made to work – in a distributed, open, cross-platform and connective manner. From the LAK11 perspective, both from knowledge and learning analytics standpoints, these are significant developments.

Read Full Post »

LAK11: Metrics

There was a good discussion I had almost two years ago on LMSs and RoI. My observations were:

  1. Organizations use LMS metrics to measure employees’ learning and development and derive RoI from training initiatives. Obviously tracking and automated flexible reporting of any sort is valuable to any organization in any function – provided it is accurate to start with. And obviously lots of time and effort in organizations is spent on validating data from a LMS that in turn provides a source of constant improvement just like from other systems for other functions. These systems provide base data upon which further analyses can be conducted.
  2. At the very atomic level, tracking data is captured for an individual course. This tracking data is used as the input for other data capture around compliance, development plans and certifications. The fundamental question asked is “did employees learn?” or in predictive terms “can employees perform?” – whether it is to demonstrate compliance to legal requirements, track whether an individual is progressing as per the development plan or to certify them for skills. i.e. at the atomic level, data captured for the course is directly correlated to asking “did employees learn?” or “can they perform?”.
  3. This atomic tracking data for a LMS is time spent, attendance, scores, and satisfaction ratings (maybe cursory or detailed, and additional parameters as you suggested as well). Performance management systems could include mechanisms to track or correlate from other perspectives as part of appraisal processes, perhaps thereby adding to the accuracy of analytics.
  4. This data is tracked by means of assessment instruments such as summative assessments that use items that are of multiple types – multiple choice, Likert-scale etc. These instruments and their utility must be separated from their typical use and effectiveness. So it would be wrong to infer “no multiple choice questions, assessments, pre-tests, or Likert-scale surveys EVER”. Rather, their typical use and effectiveness in determining whether “an employee has learnt” or “an employee can perform” is important and something that is a key aspect of determining RoI.
  5. These instruments are very powerful if they (and their constituent items) meet the basic requirements of educational testing – reliability (if the assessment consistently achieves the same result) and validity (if it is really measures what it is intended to measure).
  6. This requires special expertise and time to create (not just by mapping to an established taxonomy) and establish for every course. The LMS has nothing to do with this process. This is evidenced in high stakes assessments like the SAT or GRE which have a long and statistically backed development process. 
  7. For routine courses, perhaps not many organizations or their development vendors would either know or spend that time and effort to create statistically valid tests. One would expect, though, that at least certification testing would follow a much more rigorous test creation process because of the stakes involved. 
  8. Also some instruments may be better for testing certain types of knowledge/ability than others. For example, multiple-choice questions don’t necessarily lend themselves to much more than recall of facts and routine procedures. There is a choice involved that, on a larger scale, impact the metrics that the LMS collects.
  9. Let us look at time spent. Typically, the LMS would record the beginning of a session time and the end/suspend time and add it to the overall time that has elapsed to give us a sense of overall duration that the learner has spent on the course. What can we derive from this measure? Some learners may learn faster, some slower. Some may be distracted by a phone call, others may just not have enough time to go through it all in one attempt and therefore take longer to complete. What can we glean from this? Similarly, attendance. What can we say for that, especially in larger or virtual classes where it is easy not to be noticed, although you could still be “there”? I am interpreting both these in the sense of “did employees learn?” or “can they perform?”
  10. Again “Tracking development vs. a learning plan prepares people to advance” is what is accepted traditionally as perhaps the best way to proceed. However, there are perhaps new perspectives, such as those brought about through connective, networked learning, communities of practice and informal learning, that may merit some thought and attention at least in terms of the impact these could potentially have on how we learn and how we have managed these challenges traditionally.

Based on the next version thinking on Learning Analytics, we are seeing a lot of movement around metrics for lifestreaming and merging the digital/physical worlds. There have been various attempts to survey MOOC participants for information regarding their interactions and profiles (See, for example, Antonio Fini on CCK08 and Jenny, John and Roy’s work on CCK08 [full project report]).  I believe there are some covering how MOOCs should be designed (John Mak in PLENK2010). These should focus on measuring the course itself.

Here we get into an interesting new domain. It is one of the subjects for LAK11 – knowledge analytics.

George Siemens defines knowledge analytics (Educause presentation) as:

Linked data, semantic web, knowledge webs: how knowledge connects, how it flows, how it changes

But what does that imply for our metrics discussion? Stephen Downes talking about Network Semantics and Connective Learning, defines three major elements of a network – entities, connections and signals (messages interpreted by receivers). The degree of connected-ness in a network is a function, according to him, of the density of the network, the speed of communication, flow/bandwidth and plasticity of connections. Given these, context, salience, emergence and memory become essential elements of network semantics.

Connective semantics is therefore derived from what might be called connectivist ‘pragmatics’, that is, that actual use of networks in practice. In our particular circumstance we would examine how networks are used to support learning. The methodology employed is to look at multiple examples and to determine what patterns may be discerned. These patterns cannot be directly communicated. But instances of these patterns may be communicated, thus allowing readers to (more or less) ‘get the idea’.

What are the metrics that can support these semantics, then? 

Tying these back to George’s vision on Intelligent Data and a phase of knowledge analytics where we try to estimate the distance between current level of skills and desired level of skills, I think course level design metrics should at least cover the following categories:

  1. Metrics based on the four elements that Stephen defined for differentiating any network from a learning network – autonomy, open-ness, diversity and interactivity/connected-ness, this time from the course design perspective. I had thought of Metrics from the Connectivist perspective based on these from the learning analytics perspective. 
  2. The robustness of the network for learning needs – this could include availability and adequacy of connections (people and resources), permeability, recommendor systems efficiency, speed of information flow, information processing capacity (bandwidth) etc.
  3. Metrics for the evolution of a learning network – this could include defining the state of the network as such, the state of an individual learner with respect to the network, density of connections etc. I think it will be useful to take some common models of group evolution or open collaboration.
  4. Metrics generated from social collaborative learning instruments – I don’t say tools, but instruments similar to what multiple choice questions do in traditional courses (what I call Native Collaboration techniques with their genesis in Critical Literacies)
  5. Level of personalization (environmental adaptation) – measures that record how well the environment personalized itself to specific learner requirements

There will doubtless be more categories. Maybe there is also some work already underway to remodel the four levels of evaluation that Kirkpatrick proposed in the SoMe context which should throw up some more categories and metrics.

Read Full Post »

LAK11: Learning Analytics with LAK11

Of course, the world changed with SoMe. No longer were we thinking about a central portal where everyone came to, rather the service became the medium for distributed communication – with distributed cores. Each core, each node generating intelligence, transmitting and amplifying information. We have started leaving data trails across various Web 2.0  and Web 3.0 services.

It is like marking a path with evidence of where you have been and what you have done. It throws open rich analytic and predictive possibilities for software systems and a far greater interoperability for data exchange between systems using open API.

George Siemens talks about these trails in-depth and he started a Learning Analytics Google Group.

Whereas the first generation of learning analytics have featured analysis upon base data like page visits, time spent, forum interactions, tool usage, basic user-to-user interactions etc., the second generation of Learning Analytics focuses on extracting data from lifestreams. George Siemens describes LA as going beyond web analytics and educational data mining.

Learning analytics is broader, however, in that it is concerned not only with analytics but also with action, curriculum mapping, personalization and adaptation, prediction, intervention, and competency determination.

In George’s vision of the process of LA, there are two broad sources of data. One is learner data that we collect through lifestreaming, LMS and PLEs. The other one is the contextual or intelligent data  – curriculum, linked data and semantic data. While learner data helps us build “profiles”, learner and intelligent data feed forward into analyses of various types such as SNA and Signals through data trails. This builds the “basis for prediction, intervention, personalization, and adaptation“.

George’s vision is that LA will be transformative through systems that will analyze who the learner is, what her skills are and how do they measure against the state of the art, given a context. This marks a big transition from pre-designed curricula to  “a real-time rendering of learning resources and social suggestions based on the profile of a learner, her conceptual understanding of a subject, and her previous experience.” It also changes what we think of for competency and performance.

Responding to a question about scalability and the division between automated analytics and human interventions in analytics on the Learning Analytics Google group, George acknowledges that we would need to harmonize the technical and social dimensions of learning and analytics. We would need to understand better what technology can do and what human interventions can do.

David Wiley  also pointed out correctly that the cost and difficulty of aggregating lifestream information has gone down considerably. But at the same time, he is unhappy that Learning Analytics may become sort of Behaviorism 2.0 and suggests that:

It seems absolutely critical to me that the results of LA can provide only a portion of the data necessary for making decisions, and that it must be a human with more subtle meaning-making capabilities that ultimately acts on the data coming out of LA.

George also points out 9 different dimensions including:

  1. Learning Analytics version 1, where traditional analytics are used
  2. Web Analytics with SNA
  3. Distributed Network analysis – going from a single tool or platform to analysis across tools and platforms
  4. Social Integration: where content and connections get semantically linked to lifestreams
  5. Semantic/Linked Data leverage
  6. Knowledge analytics: some way to describe current vs expected state of learning and knowledge. I talked about Connection Holes from the Connectivist standpoint. Very simply speaking, if learning is the process of making connections, learning is deficient or has holes if the right connections are not made.
  7. Intervention/Personalization/Adaptation
  8. Holistic physical/virtual world analytics
  9. Lastly, a fully integration of “what we do on a daily basis”

Rebecca Ferguson had done a summary of our discussions so far which should be worthwhile to look at. There is also a bibliography that should have found its way into LAK11 here.

Read Full Post »



My experience with Learning Analytics started about ten years ago with egurucool.com, arguably India’s first large scale online learning initiative for the K12 segment, as its CTO. Charged with building the technology frameworks to support over 20,000 users and over 12,000 hours of learning content across grades 9-12, we designed and built a complete LCMS with analytics.

The tools included distributed authoring, workflow, versioning, content management, content publishing, learning management, student performance reporting, customer relationship management and business analytics with the inherent re-use model enabling deployment of on-the-fly configurable and expanding list of products to online web, captive school systems, print, dynamic assessment and franchise centre applications.

At the centre of the architecture was the concept of a content form. My logic was that content can take many forms of organization, from simple to complex types. But each content form would have associated tangible meaning like, for example, a list of pre-requisites which would have a form that involved three elements – the pre-requisite sequence number, the statement and a brief explanation. A content form in its simplest version was a formatted text/image (FTI – rich text or HTML) fragment. So a pre-requisite would be a combination of a sequence number and two FTIs.

The presentation layer, given its understanding of this content form, could decide to represent this content form in multiple visual ways, given the target medium (web, print, custom). Similarly, a multiple choice question would be rendered as a complex content form with the stem and options being FTIs. Each such content form would be given a unique identification number that would help track interactions on or development of that artefact through the entire lifecycle – from authoring to use by the student.

A page of content, on the web/custom application or in print, would have fractions or more of such content forms. We had over a million such content forms developed in the space of a year and a half.

Such an approach made it easy to assemble, disassemble and reuse content. With the presentation style separated from the data and the structure of data, the entire system became extremely flexible. Creating a test prep product involved writing some rules to extract only Practice Exercises and Online tests from the database for a specific grade/class. It also made possible rapid deployment and upgrades of course material.

An important factor was homogeneity of input data in terms of raw formats. Today, a blog post is structurally indistinguishable from an email message content, i.e. it has a title, a URL/Identifier, content/body, tags and categories. It is the packaging (the envelope) that is different for both. We had to take special care to preserve homogeneity of the raw format.

So things got difficult, often impossible, when we crossed formats boundaries and were faced with the yet unsolved question of content equivalence – how do we make homogenous (or render to same base denomination)  content in different formats. It is easy to start from a given format and try to think of converting to others (start with text, convert to audio or even animated video stories). But to take two different objects in different formats, and try to establish homogeneity is difficult, if at all desirable. We homogenized what we could, and left others embedded in their native formats.

From the point of view of analytics, this content form would be a base dimension of our analytics. The content form could be a fraction of or could span multiple online web pages. Based on its association with the curriculum structure, we always knew basic statistics about who had accessed what, with what frequency, when and for how long. It was easier to determine performance statistics from scored content forms such as online quizzes.

It was also easy to build contextual searches based on exact content form in addition to knowing the content. And from the services/tools perspective, we had a series of tools like Ask an Expert, that could also get specific context because we always knew where the student was. Interestingly, all this user activity was also made available to our CRM which could run targeted queries to determine level of activity across individuals or groups, thus providing actionable strategies for intervention. Our data warehouse based on Oracle contained all tracking and performance information and had multiple data cubes for analysis by CRM, teachers and SMEs.

Of course, all these tools helped us provide an awesome dashboard for our students. They knew exactly how they were using the medium for learning, This was in addition to performance reporting mechanisms for the tests they took. By the time NIIT bought us out in 2002, we had already started taking it to the next level in all directions. Had we continued working on the solutions we built, I think we would benefited the most from the Web 2.0 / SoMe explosion.

Read Full Post »