CAT | Contextual musings

The IIG User Conference

At EMC Information Intelligence Group (IIG) in ANZ we’ve recently held a User Conference.

Fairly easy you would think – pick a date, arrange a venue, organise a few speakers, invite some customers, a few cocktails afterwards and ‘Bob’s your uncle’

But maybe Bob isn’t my uncle?

As a content management supplier we should be really good at the inviting customers portion, after all doesn’t marketing have all of the contacts sitting in a database?

All of our installed base, technical and business contacts all ready to blast with an email inviting them to the IIG event of the year?

If only it were that simple…

The issues arise when you ask how many of our contacts have opted-in to be on our marketing database. Under the Spam Act 2003, contacts need to have given express consent to receive the email. Of course, as existing customers, consent could be implied due to our existing business relationship but marketing departments are reluctant to take the chance.

Anyway, as usual, we used our existing relationships to send personal invites so all went well (very well in fact) but it raises some interesting points about how the content that we routinely collect is used.

Over the few weeks I’ve been pondering this and I’ve read a few articles that highlight the way personal information is being used (and maybe abused).

We need to ask some hard questions!

I’ll start by first looking at the Australian Government (one of my favourite topics)

The Greens Labour Government in Australia has recently taken a ‘swing’ at News Limited in the wake of the UK Phone Hacking Scandal and Julia Gillard indicated that there were (undefined) “Hard Questions” that need answering. Maybe there are, who knows? The phrase innocent until proven guilty does spring to mind however…

Then interestingly, a few days later, I read another article in The Australian describing how political parties hold far-reaching personal information on all voters, which they use for their own marketing purposes – without any oversight.

This information is made possible because the government of the day included an exemption from Privacy Laws in 2000 (against the advice of the federal privacy commissioner at the time).

Also the electoral commission is required to provide electronic copies of the electoral roll and send monthly updates. Surely some “Hard Questions” need to be answered by the Government regarding this?

This information contains name, address, date of birth, age, sex and occupation so is not too extensive but forms the basis of a repository of information that can be built upon. This is where the hard work begins or is it really that hard in these days of social media?

Social Media

Social Media sites such as Facebook, Twitter even LinkedIn are the ultimate tools for ‘opting in’, giving the world a window into your personal views, beliefs, political leanings, prejudices and behaviour. LinkedIn is a more professional tool but does betray some of your views if you post comments to professional groups or volunteer status updates. Recruiters regularly use LinkedIn to identify candidates. Potential employers use LinkedIn to vet them (and Facebook to see if they can rake up anything that would discount them as a suitable candidate).

Given that the data is there, how hard would it be to use a Content Management or Big Data tool (such as the excellent Greenplum products supplied by EMC) to link the data provided in the electoral roll with related data generously provided by the members of social media websites?

All of this leads me to another article that I read in ITNews;

The Pentagon is apparently looking to spend $42m to ‘develop automated and semi-automated operator support tools and techniques to detect, classify and track the formation, development and spread of ideas and concepts’ This is to keep an eye on potential uprisings such as those that have recently happened in the Arab world but what if the Pentagon identified an uprising against the US Government? or the ASIS identified a groundswell of support for an uprising against the Australian Government?

We have lived in what many believe to be a benign environment for the last 30 years and many people do not see the harm in putting their life online -  but  equally as our ‘digital shadow’ continues to grow then we cannot rely on governments to protect our data, especially when many of us choose to open up our lives to 6 billion people and counting…

The Census

This brings me on to the Australian Census that was taken on 9th August 2011. The questions were to be the usual type of census questions; number of people in the household, ethnicity, religion, age, income, whether you have internet access etc. etc.

In the current era where many people believe that ‘if you have nothing to hide, you have nothing to fear’ then providing as much information about yourself is not a problem but if the last 30 years has shown us anything it is that today’s morality is tomorrow’s bigotry and modern hate speech legislation effectively creates a thought criminal of many of us…

“Whoever would overthrow the liberty of a nation must begin by subduing the freeness of speech.” (Benjamin Franklin)

Now, don’t get me wrong, Information has the power to transform organisations and the benefits to the world are huge and still largely untapped. At EMC that is the role we play and I’m very proud of our achievements but…

We ALL need to ensure that we protect our personal information from those that would abuse it. Much information these days is routinely collected for very good reasons but much of that is (and should remain) transient.

One of the (presumably) unintended consequences of recent legislation is that freedom of speech is not what it used to be – a lot of celebrities as well as ordinary folk have fallen foul of this and have been forced into a public ‘mea culpa maxima’. Many see this legislation necessary to ensure equality, and the loss of this liberty (in certain circumstances) is justified.

I can see both sides of the argument but I’ll leave you with another of Benjamin Franklin’s quotes

“Those who would give up essential liberty to purchase a little temporary safety deserve neither liberty nor safety”

No tags

In the first post for this topic I covered a bit of the history of full-text index (see it here), what xPlore is bringing to the table and I introduced the migration options. This post will focus on the migration options and what is entailed in each.

I could have taken a leaf out of the Lawrence Maynard Book of Controversial Writing style but I think I’ll leave the Boat People out of this one….

Firstly, some quick facts about the migration:

  • No additional license is required
  • xPlore is compatible with Content Server 6.5 SP2 (with latest hot fixes) and above
  • No client side upgrade required (5.3+ compatible)
  • Zero downtime upgrades can be easily achieved

In the last post I listed the key migration options as Cold Swap, Straight Migration and Dual Mode. Let’s look at each in more detail.

A Cold Swap

By far, the simplest approach to migrating to xPlore is a straight technology swap. This approach is basically an installation of xPlore and having it create the indexes. FAST is turned off during this process.

xPlore cold swap

It can be over the existing FAST hardware or on a new server but remember that if you reuse the FAST hardware you complicate your roll back option! The supported way of restoring a backup of FAST is to restore the application and the FIXML and then rebuild the indexes from the FIXML.

One of my main dislikes with FAST is that it has never supported restoring the binary indexes directly limiting both backup/restore and replication for Disaster Recovery.

So when would I go with the “Cold Swap” option?  When the repositories are small (and therefore index time is short) or an extended period of downtime for full text search is acceptable for your implementation. Remember that in extreme cases indexing could take weeks with FAST. While xPlore will perform better, that magnitude of data will still take a long time to full text index.

A Straight Migration

This twist on the migration theme uses additional hardware to run a parallel full text index but using xPlore:

xPlore straight migration

New hardware is added to the infrastructure to house xPlore and an additional Index Agent. Both FAST and xPlore are running at the same time so the system is fully available while xPlore is doing its initial index build. At this point all searching is performed through FAST. The limitation is that only one set of indexing hardware can be used for searching at a given time.

Once the index build is finished the search is repointed to use xPlore. At this point FAST can be turned off and eventually decommissioned.

A system outage is required to repoint the searching from FAST to xPlore as the Content Server needs to be restarted after each switch. This typically would be about an hour but would vary depending on number of Content Servers, etc.

So when would I choose this option? When I had restrictions on the amount of downtime I could impose and I was in a budgetary/procurement position that allowed me to add additional indexing servers.

Dual Mode Indexing

This brings us to the last and perhaps most extreme of the options. Dual Mode indexing and search is similar to the “Straight Migration” configuration but also adds an additional content server and an additional WebTop server into the mix enabling both FAST and xPlore to be available on the same repository:

xPlore Dual mode Both FAST and xPlore are performing full text indexing and both can be used to search (via the specific WebTop instance).

Users can be migrated over time between FAST and xPlore (again via the WebTop instances) with FAST being eventually decommissioned.

This approach is clearly the most costly but minimises migration risk and business disruption. It is most suited to those circumstances where ANY downtime of search and index is unacceptable to the user base.

Concluding note

This covers the key options for migrating from FAST to xPlore. Clearly more complicated infrastructures offer more possibilities and challenges. For example, a highly available FAST install may enable use of the second row (the second set of nodes for redundancy) to add xPlore with non-HA indexing being acceptable during the swap over. We have virtualisation options now, and so on.

If you are interested in some of these scenarios it might be time for a chat with your friendly EMC IIG Services professional……

Climate Change is a great topic of debate these days and it is scary stuff. This article will, as usual, link back to Content Management but you’ll need to stick with me for a while….

Climate Change Documentum

Climate Change photo attributed to http://www.flickr.com/photos/tellytom

Man Made Climate Change – The Key Players

In the Red Corner

On one side of the debate we’ve got the scientists and the United Nations who actively promote the idea of man-made climate change on the basis of what some would argue are flimsy models, speculative science and poor peer reviews (all lured by the funding on offer).

In the Blue Corner

On the other side we have the ‘climate change deniers’ who have been likened by some pro-climate change politicians to ‘holocaust deniers’.

They are a varied bunch; they range from those with a vested interest (due to being funded by the organisations with most to lose) to people who suggest that the Iron Mountain Report is authentic (time to Google) and that the fear of Climate Change is the latest mechanism for controlling the population. Some also claim that the UN’s Agenda 21 is the way that climate change fear is being used to destroy private property rights and move towards a New World Order…

The Spectators

Sitting in the front row seats are Governments – always looking for a new and exciting tax (some may say) ‘believe’ that a carbon tax may be the best way to go, taxing ‘polluters’ (like us) and using the proceeds to compensate those who they believe will suffer the greatest hardship (be it big business or the low paid).

In reality it’s just another massive re-distribution of wealth allowing ‘the state’ to further control the post-tax income level of the middle to high income workers.

Then in the cheap seats are the rest of us…

The majority of people are worried and confused. On one hand they’re worried about the fate of the planet (and future generations) and on the other hand they’re worried about whether they can pay the rent or the power bill and the furthest thought from their collective mind is whether this is a diabolical plot to introduce global government with total control of the world’s population…

So the bout has begun – we’re a few rounds in, there’s been a lot of sparring but although the red corner appears to have the upper hand no clear winner is has yet emerged.

Despite the fact that there is a lot of content floating around but still there is no consensus.

Of course, Content Management could help us to collect, collate and analyse the data but that’s not where I’m going with this….

Enter EPFM…

DocumentumWorld will soon be focusing on the Energy Sector and whether you wholeheartedly believe in Man-Made Climate Change, are a skeptic or an outright denier the fact is efficiency leads to conserving natural resources.

Also the switch to renewable energy generation is a huge programme of work for the energy sector and if it is done efficiently the time to value is not only accelerated in terms of the project itself but the time to value in terms of reducing CO2 emissions and reducing our dependency on fossil fuels is also accelerated.

It doesn’t matter what side of the debate you are in there should be little debate that this is:

  • Good for the environment
  • Good for future generations
  • Good for business
  • Good for consumers

EPFM is EMC’s Engineering, Plant and Facilities Management solution and is a ‘best practice’ implementation of Documentum designed to optimise critical business processes within the engineering industry.

Where an energy company is looking building and commissioning new power stations, wind farms, solar power stations in northern Queensland or even nuclear power stations, EPFM can optimise the six phases of the plant lifecycle (feasibility, design, construct, operate, renew and decommission) by ensuring that the significant amount of content and processes that are shared between users, departments and external organisations is able to be accessed in a timely manner without duplication, at the correct version level and without potentially expensive data loss.

Not having a solution such as EPFM has the effect of delaying project completion, increasing overall project costs and ultimately delaying the switch to renewable sources of energy.

Having EPFM would allow the energy sector to not only derive all of the financial benefits of these efficiencies but would also allow it to report accurately on its progress towards the ‘green’ goals it sets (or has set for it).

The Golden Age

So I hope that this look at EPFM has demonstrated that the Energy Sector is in a unique position to make the switch towards sustainable energy, while demonstrating its progress towards achieving these goals. Solutions, such as Documentum EPFM, are a key tool in achieving these aims.

Maybe by assisting the Energy Sector with their move towards sustainable and renewable sources of energy, EMC can help save the planet, avoid a carbon tax, reduce power bills and halt the march towards Global Servitude – the New World Order will have to wait for another day

·

My name is Lawrence Maynard; I’m married with two boys and work for the Information Intelligence Group (IIG) of EMC as the Regional Services Director for Australia and New Zealand.
Regarding my professional background, it’s probably easier to link to my Linkedin profile rather than doing a detailed intro.

Anyway, I decided to start writing this blog because many of the Content Management blogs are very feature / function centric. Whilst I’m not a Documentum expert (I work for EMC and manage the Services practice for Australia and New Zealand), I know enough to be dangerous…so I’m told.

I’m hoping that by being inside EMC I can give you some inside insight into the more business related aspects of the Documentum World and also learn something from the very talented people that are out there (you) and hopefully feed this back into the wider EMC organisation.

As this is my first blog entry I thought I’d start with a subject that is a little bit different, although I do intend to get back to the IIG Strategy in future posts.

The Digital Genome Project, Pentecost and Enterprise Content Management

Photo attribution, creative commons: http://www.flickr.com/photos/jurvetson/4621366471/

A couple of things happened towards the end of May 2010, both seemingly unrelated to each other and one of them only partially related to Documentum.

The first was the feast of Pentecost – this is one of the most important days in the Christian calendar and the events are best summed up as:-

“And when the day of Pentecost was fully come, they were all with one accord in one place. And suddenly there came a sound from heaven as of a rushing mighty wind, and it filled all the house where they were sitting. And there appeared unto them cloven tongues like as of fire, and it sat upon each of them. And they were all filled with the Holy Ghost, and began to speak with other tongues, as the Spirit gave them utterance”
The Apostles began ‘speaking in tongues’ and were understood, by everyone present, in their own language.

The second was the ‘Digital Genome’ that was secured in a bunker deep under the Swiss Alps:-
This project is the culmination of four years of work by European researchers who deposited a ‘Digital Genome’ that will provide the blueprint for future generations to read data stored using defunct technology.

The sealed box contains the key to unpicking defunct digital formats. This box will be locked away for the next quarter of a century behind a 3 ½ tonne door strong enough to resist nuclear attack at the data storage facility, known as the Swiss Fort Knox.

So what have these two events in common?
Simply that both events are a response to a problem – the problem being how to best ensure that data reaches the intended recipients in the correct format (information).
In the case of the Apostles, their message, delivered in an unrecognisable format, was meaningless. However, with ‘divine intervention’ their data was transformed into information.

The divine intervention enjoyed by the Apostles is probably not coming our way with regards to saving our archaic data formats so the response of the Digital Genome project is a sensible one ensuring that with the ever decreasing lifespan of our data formats (both hardware and software) information is not lost forever.

Why do we care?

Written information has traditionally been created in an encrypted manner – a language. This language can be analysed by future generations and de-coded. In this way all of the accumulated and stored wisdom of a civilisation can be unlocked with the correct key, such as the Rosetta Stone.
The Digital Genome project is an attempt to proactively produce a Rosetta Stone for future generations and as such should be viewed as a noble project but what is the impact of doing nothing?

We capture much more information than ever before. The Digital Genome project researchers estimate that there is 100 GB of data for every person on the planet – this is equivalent to 24 tonnes of books per person.

IDC estimate that the amount of electronically stored data will exceed 1.2 ZB in 2010, this is actually almost 200 GB per person, much of which is unstructured (95%) unmanaged (85%) and becoming more regulated. In addition 85% of this data is ‘managed’ by organisations.

Some may argue that we have already lost an unmeasurable amount of content such as conversations and physical documentation that will never be recovered. Much of the digital content we capture these days, it could be argued, is unnecessary, with obsolete data formats being just a logical progression of the information loss that has been happening for millennia

Others argue that we are in danger of a ‘Digital Dark Age’ where this loss of valuable information to future generations could be likened to the low volume of written records from the middle ages and this is certainly the motivation being the Digital Genome.

Attribution, creative commons: http://www.flickr.com/photos/andrewwilding

There is a third school of thought; that is that the ‘Digital Dark Age’ is being overstated and that most examples of lost information are really examples of where information has been recovered, albeit at great expense and that for Digital Archiving to be taken seriously we need real evidence of the costs of ‘do nothing’. http://www.digitalpreservationeurope.eu/publications/position/Ross_Harvey_black_hole_PPP.pdf

All of the above may be valid view points but one fact is undeniable;
Information has traditionally been available to users even after a generation or more (when arguably much of its relevance is lost) because it was stored in a universally readable, low-tech format (assuming we have the ability to translate).

Digital information, however, is stored in a proprietary, high-tech format that, due to the rate of technological change, may be disappearing at an ever-increasing rate, well within its useful life.

This problem is highlighted by the Digital Genome project which estimated that the European Union loses data valued at three billion Euros every year. This is likely to increase as the lifespan of digital formats reduce. Current estimated lifespan is less than 20 years for hardware technology such as optical drives and 5-7 years for data formats – well within the retention policy timeframes of many organisations.

So evidently we have to care but the question I am asking myself is, what should be the response of the Content Management industry to this problem?

By way of one example, here is a link to an EMC press release regarding the Kennedy Presidential Library http://www.emc.com/about/news/press/us/2006/20060609-4439.htm

EMC Libaray of congress

attribution, creative commons: http://www.flickr.com/photos/johnmcnab/3316278028/

This project was to build a digital library consisting of the entire collection of papers, documents, photographs and audio recordings of President John F. Kennedy, with the aim of eventually making them accessible to citizens throughout the world via the Kennedy Presidential Library and Museum’s website — www.jfklibrary.org.

This solution addresses the problem of storing and retaining a specific set of historical records using EMC hardware and software and as such the content becomes managed and structured content. To a certain extent this safeguards this content for future generations as long as the solution is maintained (the hardware and software is managed, maintained and current)
Of course this won’t protect against a complete breakdown in civilisation but I suspect if that happened we may not care too much about lost content!

A potentially bigger issue is to ensure that data formats continue to be supported, many content management platforms integrate with universal viewers such as Brava that allows organisations to view multiple document formats without the need to have the native software on every client.

So, as long as content is managed, structured and the format is recognised and continues to be supported by the viewer then there’s no problem, right?
A very simplistic view but given that 85% of all content is unmanaged (over 1 ZB) then we really are in danger of losing a lot of data….

So why are we in this position?
A very good question, you would think that an organisation would value its information as this represents its accumulated intellectual property. Unfortunately the industry has done a poor job, in my opinion, of selling the value of traditional content management and implementations are viewed as being complicated and potentially prone to failure.

This is not necessarily due to the products themselves but content management does require a considerable amount of planning and change management – I worked with ERP for many years and content management is much more of a change for organisations.

These projects are not a technology drop and if treated as such they are likely to fail but if they are approached correctly then the results can be superb.
The benefits are not only that valuable IP will be available ‘in perpetuity’ but once that information is captured it can be used for business decision making purposes – content is no longer just created, distributed, maintained and disposed of – now information is increasingly being interacted with.

The logical progression of traditional content management is what is called Composite Content Applications – EMC have taken a step towards this evolution with the xCP Case Management suite (I’m sure someone will jump on this and tell me that CCA and Case Management aren’t the same but it’s easier to articulate Case Management than CCA) and many other players are jumping on the case management bandwagon so expect to see the major ECM developments being in this area (a lot more on this in future blogs I’m sure).

Anyway, I digress, information preservation is a serious problem for us all and if the content management industry has many of the answers, why is there still so much unmanaged data?

Maybe we’re all praying for divine intervention?

· ·

Theme Design by devolux.nh2.me