Friday, March 30, 2012

I found a bug in MSSQL DTS import wizard

We have a textfile that we import in Excel(provided by our mainframe folks).
It's fixed width with 142 columns. When I import it in excel or ms access,
it gives me about ~34000 rows. But when I import using DTS package that is
exported to Access mdb, rows becomes ~17000 rows
And another hurdle that I'm faced is that there are rows that contains
combined data such as street, city and state all together without spaces.
Unfortunately, we can't modify the source text file. However, I do have a
fieldmarker(save a .txt file) which I always use in Excel so that when I
import it, I can select the area to create a vertical line just right that
marker.
Here is a sample of my fieldmarker.
12345p1234567p12pp123456p
Everytime i see a p, I click it with a mouse to create a vertical line in
fixed width importing. In that case, there are 5 columns. These are all
similar to ms access or excel.
The bug.
Please see this link below about the bug I found in MSSQL DTS Import wizard.
It doesn't happen in Excel or MS Access. Excel and MS Access can handle it
properly.
http://restricted.dyndns.org/dtsimportissue.html
Please help. Solution needed very badly!
Thanks,
Neil
"Neil" <neil-on-ht@.restricted.dyndns.org> wrote in message
news:Oj$m4gz5EHA.1292@.TK2MSFTNGP10.phx.gbl...
> We have a textfile that we import in Excel(provided by our mainframe
> folks). It's fixed width with 142 columns. When I import it in excel or ms
> access, it gives me about ~34000 rows. But when I import using DTS package
> that is exported to Access mdb, rows becomes ~17000 rows
> And another hurdle that I'm faced is that there are rows that contains
> combined data such as street, city and state all together without spaces.
> Unfortunately, we can't modify the source text file. However, I do have a
> fieldmarker(save a .txt file) which I always use in Excel so that when I
> import it, I can select the area to create a vertical line just right that
> marker.
> Here is a sample of my fieldmarker.
> 12345p1234567p12pp123456p
> Everytime i see a p, I click it with a mouse to create a vertical line in
> fixed width importing. In that case, there are 5 columns. These are all
> similar to ms access or excel.
> The bug.
> Please see this link below about the bug I found in MSSQL DTS Import
> wizard. It doesn't happen in Excel or MS Access. Excel and MS Access can
> handle it properly.
> http://restricted.dyndns.org/dtsimportissue.html
> Please help. Solution needed very badly!
It looks as though DTS is using the CRLF in the first line to define the
line lengths, and concatenating data in the rows because they are too short
and therefore incomplete records. What happens if you trim the first line
back to the same length as the other lines? Or how about padding the lines
out so they are all the same length?
According to http://support.microsoft.com/default...b;en-us;271247
if any rows are longer than expected then additional characters appear
wrapped in the DTS wizard, but are actually discarded.
However, in article
http://support.microsoft.com/default...b;en-us;247360 it mentions
that DTS expects a fixed length file to have lines of fixed length, and
having shorter lines will cause additional characters to be read from the
next line.
So the answer appears to be that in a fixed length import, make sure every
line is the same length to avoid this problem.
Dan
|||"Daniel Crichton" <msnews@.worldofspack.co.uk> wrote in message
news:uZDTPI05EHA.3076@.TK2MSFTNGP15.phx.gbl...
> It looks as though DTS is using the CRLF in the first line to define the
> line lengths, and concatenating data in the rows because they are too
> short and therefore incomplete records. What happens if you trim the first
> line back to the same length as the other lines? Or how about padding the
> lines out so they are all the same length?
> According to
> http://support.microsoft.com/default...b;en-us;271247 if any rows
> are longer than expected then additional characters appear wrapped in the
> DTS wizard, but are actually discarded.
> However, in article
> http://support.microsoft.com/default...b;en-us;247360 it
> mentions that DTS expects a fixed length file to have lines of fixed
> length, and having shorter lines will cause additional characters to be
> read from the next line.
> So the answer appears to be that in a fixed length import, make sure every
> line is the same length to avoid this problem.
>
Ok. Then we're out of luck then. I will contact the vendor if there will be
data for those columns in the future. If not, I can just delete it for now
to avoid the problem. It's just that MS Excel and Access support it. I don't
know why they didn't do that for DTS
Thanks.
Neil
|||Neil,
I don't quite understand your explanation or the online example. You
say this is a fixed-width file, but the example you give online shows a
first line that is much longer than the second line. Fixed-width means
every line of the text file has the same length. Your file has lines of
different lengths, but it also doesn't have a separator character
between each column, so the wizard won't help you, and you'll need to do
something else, I guess.
Maybe one of these will work:
Import using BULK INSERT with a format file.
Preprocess the file with Unix-like utilities to add delimiters, then
import as delimited
Import the file into a staging table with one long column, then split
it up with
insert into FinalDestinationTable
select substring(s,1,4), substring(s,5,7), ...
Steve Kass
Drew University
Neil wrote:

>We have a textfile that we import in Excel(provided by our mainframe folks).
>It's fixed width with 142 columns. When I import it in excel or ms access,
>it gives me about ~34000 rows. But when I import using DTS package that is
>exported to Access mdb, rows becomes ~17000 rows
>And another hurdle that I'm faced is that there are rows that contains
>combined data such as street, city and state all together without spaces.
>Unfortunately, we can't modify the source text file. However, I do have a
>fieldmarker(save a .txt file) which I always use in Excel so that when I
>import it, I can select the area to create a vertical line just right that
>marker.
>Here is a sample of my fieldmarker.
>12345p1234567p12pp123456p
>Everytime i see a p, I click it with a mouse to create a vertical line in
>fixed width importing. In that case, there are 5 columns. These are all
>similar to ms access or excel.
>The bug.
>Please see this link below about the bug I found in MSSQL DTS Import wizard.
>It doesn't happen in Excel or MS Access. Excel and MS Access can handle it
>properly.
>http://restricted.dyndns.org/dtsimportissue.html
>Please help. Solution needed very badly!
>Thanks,
>Neil
>
>
|||Sorry if my example is to confusing. The first row is actually coming
from a different text file. I call it field separator. I just wrote that
website for reference. My real field separator is about 2,050 characters
long. the data really starts at row 2 and they are all in fixed length.
Going back to the field separators, I use it only so that when I import
it using Fixed width, I will be able to tell where I need to click the
mouse.
Let say the mainframe file specs goes like this:
loanno text(10)
datewritten text(8)
code text(2)
state text(2)
zipcode text(5)
firstname text(5)
lastname text(5)
If the data was like this, it would have been ok
1234567890 12/21/04 DD IL 60061 frank burns
1234567891 12/21/04 DD IL 60061 ana saige
1234567892 12/21/04 DD IL 60061 polly yuigo
1234567893 12/21/04 DD IL 60061 jake snake
But our mainframe data is like this,
123456789012/21/04DDIL60061frankburns
123456789112/21/04DDIL60061ana saige
123456789212/21/04DDIL60061pollyyuigo
123456789312/21/04DDIL60061jake snake
For that example, I could have come up with a field marker like this
123456789p1234567p1p1p1234p1234p1234p. It works great in Excel and
Access. The p marker is just for me. It can be any. Unfortunately, the
target table for the DTS have more columns. I tried modify the data
above but still, DTS is importing it improperly. It's not intelligent
enough to pad it with blank spaces. I can pad using a different tool but
that's not a good solution. That's another point of failure.
I tried this data,
123456789012/21/04DDIL60061frankburns
123456789112/21/04DDIL60061ana saige
123456789212/21/04DDIL60061pollyyuigo
123456789312/21/04DDIL60061jake snakeBBBCCCCCDDDD
the data was still concatenated with the previous lines. So looks like,
DTS import is not honoring the {CR}{LF} but I'm 100% very sure that
there is an {CR}{LF}
Like what I said, it works perfect in Excel and Access. So I guess, this
is still considered a bug.
Thanks.
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!
|||I found a better solution in MS Access. I found it inside the Advanced
button when we do the import table wizard.
It's the table specification. It's awesome. Is this possible in DTS?
Thanks.
"Neil Camara" <neil-on-ht@.restricted.dyndns.org> wrote in message
news:ea%23nwe75EHA.2568@.TK2MSFTNGP10.phx.gbl...
> Sorry if my example is to confusing. The first row is actually coming
> from a different text file. I call it field separator. I just wrote that
> website for reference. My real field separator is about 2,050 characters
> long. the data really starts at row 2 and they are all in fixed length.
> Going back to the field separators, I use it only so that when I import
> it using Fixed width, I will be able to tell where I need to click the
> mouse.
> Let say the mainframe file specs goes like this:
> loanno text(10)
> datewritten text(8)
> code text(2)
> state text(2)
> zipcode text(5)
> firstname text(5)
> lastname text(5)
> If the data was like this, it would have been ok
> 1234567890 12/21/04 DD IL 60061 frank burns
> 1234567891 12/21/04 DD IL 60061 ana saige
> 1234567892 12/21/04 DD IL 60061 polly yuigo
> 1234567893 12/21/04 DD IL 60061 jake snake
> But our mainframe data is like this,
>
> 123456789012/21/04DDIL60061frankburns
> 123456789112/21/04DDIL60061ana saige
> 123456789212/21/04DDIL60061pollyyuigo
> 123456789312/21/04DDIL60061jake snake
> For that example, I could have come up with a field marker like this
> 123456789p1234567p1p1p1234p1234p1234p. It works great in Excel and
> Access. The p marker is just for me. It can be any. Unfortunately, the
> target table for the DTS have more columns. I tried modify the data
> above but still, DTS is importing it improperly. It's not intelligent
> enough to pad it with blank spaces. I can pad using a different tool but
> that's not a good solution. That's another point of failure.
> I tried this data,
> 123456789012/21/04DDIL60061frankburns
> 123456789112/21/04DDIL60061ana saige
> 123456789212/21/04DDIL60061pollyyuigo
> 123456789312/21/04DDIL60061jake snakeBBBCCCCCDDDD
> the data was still concatenated with the previous lines. So looks like,
> DTS import is not honoring the {CR}{LF} but I'm 100% very sure that
> there is an {CR}{LF}
> Like what I said, it works perfect in Excel and Access. So I guess, this
> is still considered a bug.
> Thanks.
>
>
>
> *** Sent via Developersdex http://www.codecomments.com ***
> Don't just participate in USENET...get rewarded for it!
|||Sorry, Neil, but I still don't understand. You have a text file with
equal length lines, and with <CRLF> ending each line, and dts doesn't
respect the end-of-line markers? You say you are 100% very sure that
there is an {CR}{LF}. Have you used a hex editor to be certain?
My earlier suggest to use BULK INSERT may work for you if indeed this is
a bug. You can specify a format file and bulk insert into a table that
has already been created.
SK
Neil Camara wrote:

>Sorry if my example is to confusing. The first row is actually coming
>from a different text file. I call it field separator. I just wrote that
>website for reference. My real field separator is about 2,050 characters
>long. the data really starts at row 2 and they are all in fixed length.
>Going back to the field separators, I use it only so that when I import
>it using Fixed width, I will be able to tell where I need to click the
>mouse.
>Let say the mainframe file specs goes like this:
>loanno text(10)
>datewritten text(8)
>code text(2)
>state text(2)
>zipcode text(5)
>firstname text(5)
>lastname text(5)
>If the data was like this, it would have been ok
>1234567890 12/21/04 DD IL 60061 frank burns
>1234567891 12/21/04 DD IL 60061 ana saige
>1234567892 12/21/04 DD IL 60061 polly yuigo
>1234567893 12/21/04 DD IL 60061 jake snake
>But our mainframe data is like this,
>
>123456789012/21/04DDIL60061frankburns
>123456789112/21/04DDIL60061ana saige
>123456789212/21/04DDIL60061pollyyuigo
>123456789312/21/04DDIL60061jake snake
>For that example, I could have come up with a field marker like this
>123456789p1234567p1p1p1234p1234p1234p. It works great in Excel and
>Access. The p marker is just for me. It can be any. Unfortunately, the
>target table for the DTS have more columns. I tried modify the data
>above but still, DTS is importing it improperly. It's not intelligent
>enough to pad it with blank spaces. I can pad using a different tool but
>that's not a good solution. That's another point of failure.
>I tried this data,
>123456789012/21/04DDIL60061frankburns
>123456789112/21/04DDIL60061ana saige
>123456789212/21/04DDIL60061pollyyuigo
>123456789312/21/04DDIL60061jake snakeBBBCCCCCDDDD
>the data was still concatenated with the previous lines. So looks like,
>DTS import is not honoring the {CR}{LF} but I'm 100% very sure that
>there is an {CR}{LF}
>Like what I said, it works perfect in Excel and Access. So I guess, this
>is still considered a bug.
>Thanks.
>
>
>
>*** Sent via Developersdex http://www.codecomments.com ***
>Don't just participate in USENET...get rewarded for it!
>
|||"Steve Kass" <skass@.drew.edu> wrote in message
news:OKuZfY%235EHA.2876@.TK2MSFTNGP12.phx.gbl...
> Sorry, Neil, but I still don't understand. You have a text file with
> equal length lines, and with <CRLF> ending each line, and dts doesn't
> respect the end-of-line markers? You say you are 100% very sure that
> there is an {CR}{LF}. Have you used a hex editor to be certain?
See my earlier reply with articles from the MS KB that mention that the
fixed width import requires all lines to be the same length, otherwise it
concatenates the data from successive lines to pad shorter lines out. Looks
like the CRLF is only used for the first line to determine the fixed row
length, all other lines are then dealt with using that length - extra
characters on lines are discarded, lines that are too short are treated as
continuing onto the next line.
Dan
|||Ok then. I'd have to accept that that is the way DTS works. I have to
fix my source by padding spaces. I can only say that Access or Excel
pads everything automatically for us. At least, that article explains
what I have found out.
*** Sent via Developersdex http://www.codecomments.com ***
Don't just participate in USENET...get rewarded for it!
sql

No comments:

Post a Comment