Error! Not a valid filename.
Document Conversion
This document is suitable reading for anyone contemplating, or in the process of converting and cleaning a batch of word processing documents.
Basic Principles
The Prospect Generally Doesn’t Know Their Source Files
Whether I am dealing with an intermediary or directly with the prospective client, I have found that whether we are discussing 200 or 20,000 documents, the end user does not know their documents.
The documents are in daily use, and are used to generate clones or hard-copy forms, but no one has really looked under the hood to see how the documents are built.
Often the end-user will be aware that there is something wrong, because checks and balances are in place to reduce the disturbance to editing.
Sometimes each page is isolated as a section (a sign that hard-coded page numbering has been implemented).
Sometimes we see a mixture of numbered styles, sequence fields, and plain hard-coded numbering evident in a single set of numbered paragraphs.
We often see numbering out of sequence – usually a sign of hard-coded level numbering coupled with temporary workers or an acute shortage of time to do the job right in the first place.
In general, we are the first person to proclaim in a loud voice, and in writing, that there is much wrong in the kingdom, and it will cost time and hence money to make things “right”.
Much of the cost is accrued to defining what is “wrong”, and hence what can be considered to be “right”.
The definition of “wrong” allows us to specify rules for recognition – how we can detect areas that need our attention.
The definition of “right” allows us to specify the action to be taken once we recognize a wrong.
Together the definitions of right and wrong allow us to specify an acceptance test, by which means we will be paid for our services.
The Prospect Wants A Magic Wand To “Make Them Clean”
We can pretty well guarantee this. If the client wanted to do any work at all, they would have investigated their documents and know that there was no magic wand, no overnight process to solve their woes.
We do run conversions overnight, but not at the drop of a hat, only after careful review of the client’s needs and expectations.
Usually Provides Less Than Four Documents
These will be either the four documents that prompted the flurry of activity, or else documents chosen from various people in the office, based probably on personality.
There is no valid statistical sample, nor will there be any quantitative justification for the selection.
We can consider the selection as non-representative of the population as a whole, but can use the selection to justify some of our general statements, which are presented in this document.
Often Provides Only A Printout (Hard-Copy)
The hard-copy format gives us some idea of some of the keyboard-text problems, but cannot give us any idea of automatic numbering (whether by {Sequence} fields, Heading styles or list number styles. The hard-copy makes no statement about multiple white-space characters as distinct from proper (styled) spacing.
Has Recently Received Training, Doesn’t Understand Styles.
Training rarely proceeds in synchronization with document conversion. Training is easy to schedule; you hire someone who knows Word, your staff take a class, and in theory your people know all about Word.
In practice a course in styles and templates alone ought to consume about a week, and who has time for that, right?
So in practice, no one has really understood styles, and their implications. Which is why we are useful.
We will tell you that the main task is recognizing text, and applying styles, but it will probably take several meetings before you really believe us when we say that you can define the characteristics of the styles afterwards.
Has No Template, Wants A Template
There will be no corporate or departmental statement on standards of formatting, and certainly there will be no defined hierarchy of styles.
Has No Real Concept Of Clean Now And Clean Later
Like tree roots that continue to surface in a wheat paddock, issues with documents will crop up after each pass. It is always so. It is the nature of examination of documents followed by a deeper understanding of what can be done. It is not a blinding revelation that comes all at once.
Won’t Want Iterative Solutions But Will Iterate By Telephone
There will be no time to plan the conversion so that it proceeds in a well-managed manner, but there will be time after it is completed to do post-game analysis, none of which will be in writing.
What We Must Provide
We ask that the prospect read this document and sign off on an understanding of what we say.
A Simple Top-Level Analysis Of Documents (“Quant”ifier)
We ask that the prospect run our free read-only Quantifier to gain a first impression of the nature of the documents being proposed. See QuantifierSample.doc for an example.
Session To Introduce & Demonstrate Docle
At our usual rates we will travel to the client site and demonstrate the Document Cleanser on sample documents. The prospect will watch it run, will examine the rules table, and will inspect the output.
Obtain Estimate Of Data Base (Number And Types Of Documents)
We will issue the prospect with a copy of our Quantifier to generate a profile of the document base. If the prospect has baulky documents, we can run our files processor at our site to help the prospect to weed out the baulky documents.
Consultation (1/2 Or 1-Day To Start Them Thinking)
At our usual rates we will travel on-site to provide information groundwork for the prospect’s collection of documents. At this session we will ask some questions and indicate to the prospect the nature of their real problems and the likely cost of the solution.
An End-Of-Consult To Let Them See How Much They Have Gained.
After the initial consultation the prospect is in a position to make a decision. The decision might be to abandon attempts at document conversion; sometimes there is little benefit gained in converting an archived collection.
Estimate Of Delivery Time
We will provide an initial estimate of delivery time based on our real-world experience of the time it takes to specify the conversion to be applied to a batch of documents.
Acceptance Test
We will specify an acceptance test agreeable to the prospect. The acceptance test will be the criteria by which the event “delivery” is judged and on which payment of the balance of all fees is made.
Training
We will provide training as required, especially so in those areas of documents which come under our umbrella of “conversion and cleansing”.
Well-Documented Description Of What Will Be Included
All our work is submitted in written form. While we are happy to discuss matters by telephone, we will always provide a typed copy of the conversation, our understandings, our promises and our expectations.
Well-Documented Description Of What Will Be Excluded
No surprises. We will provide explicit descriptions of what will be excluded from the processing by prior agreement with the prospect.
How We Do It
We Have A Set Of Basic Rules Which Have Worked Generally In The Past
While each client is different, clients share common problems with non-standard documents. Typically documents have an excessive number of redundant paragraph marks, mixtures of numbering styles, local formatting, and so on.
Our experience guides us to propose that the most common faults be corrected early on in the conversion process.
We Must Create A Template
Much of our work is geared towards the use of paragraph and character styles, and provision of our conversion macros for ongoing use by the end-users.
We bundle all this work in a corporate template for installation on the client site.
We Must Create A Macro That Attaches That Template
Part of our process includes attaching a corporate template to each document. We pass that macro on to the client to that the end-users can retrofit existing documents.
We Must Define Styles In That Template
Much of our process involves applying styles to existing text in each document. We define and store these styles in a corporate template and pass that template on to the client.
We Must Assign Shortcut Keys To Those Styles.
We assign a set of shortcut keys to the more common styles – thus to bold text the end-user can use Ctrl-Shift-B and the selected text will have applied the character style csBold.
We Must Define Macros In That Template
We store our formatting macros in the corporate template and pass that template on to the client to that the end-users can retrofit existing documents.
We Must Assign Toolbar Buttons And Shortcut Keys To Those Macros.
We create a toolbar, housed in the corporate template, and on it place a menu with toolbar buttons for ease of use by the end-users.
Costs And Terms
We apply our standard rates and terms and conditions as a minimum set to all projects
Non-Disclosure Statement
We are happy to sign any non-disclosure statement; such a statement confirms our integrity.
Prospect Must Allocate Resources, Especially Technical Review Time
No project can proceed without some supervision by the client. If nothing else we need client time to approve our process, confirm the acceptance test, and to apply the acceptance test within the delivery period.
Terms 50% Deposit, Balance Within Ten Business Days Of Delivery
We ask for a deposit of about 50% of the estimated cost before work starts. Then we work. Then we deliver. We ask for the balance of payment on the work within ten days of delivery.
Costs As Per Our Spreadsheet Calculator
For large batch conversion jobs we operate the conversion as a service. We apply a basic formula to calculate charges.
The most common variables for the calculation are:
Original number of files reported on disk
Unsuitable Format
Aberrations
Client macros
Consultation time
Error! Not a valid filename.