구분 기호에 따라 하나의 파일을 여러 파일로 분할

Nice programing

구분 기호에 따라 하나의 파일을 여러 파일로 분할

nicepro 2020. 11. 4. 08:26

구분 기호에 따라 하나의 파일을 여러 파일로 분할

-|각 섹션 뒤에 구분 기호가있는 파일이 하나 있습니다 ... 유닉스를 사용하여 각 섹션에 대해 별도의 파일을 만들어야합니다.

입력 파일의 예

wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

파일 1의 예상 결과

wertretr
ewretrtret
1212132323
000232
-|

파일 2의 예상 결과

ereteertetet
232434234
erewesdfsfsfs
0234342343
-|

파일 3의 예상 결과

jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

하나의 라이너, 프로그래밍 없음. (정규식 제외)

csplit --digits=2  --quiet --prefix=outfile infile "/-|/+1" "{*}"

awk '{print $0 " -|"> "file" NR}' RS='-\\|'  input-file

설명 (편집 됨) :

RS이 솔루션은 두 개 이상의 문자가 될 수있는 gnu awk 확장을 사용합니다. NR레코드 번호입니다.

print 문 " -|"은 이름에 레코드 번호가 포함 된 파일로 레코드를 인쇄합니다 .

데비안에는 csplit이 있지만 이것이 모든 / 대부분 / 다른 배포판에 공통적인지 모르겠습니다. 그렇지 않다면 소스를 추적하고 컴파일하는 것이 너무 어렵지 않아야합니다.

나는 파일에 뒤에 오는 텍스트가 들어가야하는 이름이있는 줄이 포함되어있는 약간 다른 문제를 해결했습니다. 이 펄 코드는 나를 위해 트릭을 수행합니다.

#!/path/to/perl -w

#comment the line below for UNIX systems
use Win32::Clipboard;

# Get command line flags

#print ($#ARGV, "\n");
if($#ARGV == 0) {
    print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename.  All of the contents of filename.txt are written to that file until another mff is found.\n";
    exit;
}

# this package sets the ARGV count variable to -1;

use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);

# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");

while($_ = shift @ARGV) {
    if(-f "$_") {
    push @filelist, $_;
    } 
}

# Could be more than one file name on the command line, 
# but this version throws away the subsequent ones.

$readfile = $filelist[0];

open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;

while (<SOURCEFILE>) {
  /^$mff (.*$)/o;
    $outname = $1;
#   print $outname;
#   print "right is: $1 \n";

if (/^$mff /) {

    open OUTFILE, ">$outname" ;
    print "opened $outname\n";
    }
    else {print OUTFILE "$_"};
  }

다음 명령이 저에게 효과적입니다. 도움이 되었기를 바랍니다.

awk 'BEGIN{file = 0; filename = "output_" file ".txt"}
    /-|/ {getline; file ++; filename = "output_" file ".txt"}
    {print $0 > filename}' input

awk를 사용할 수도 있습니다. 나는 awk에 익숙하지 않지만 다음은 나를 위해 작동하는 것 같습니다. part1.txt, part2.txt, part3.txt 및 part4.txt를 생성했습니다. 이것이 생성하는 마지막 partn.txt 파일은 비어 있습니다. 나는 그것을 어떻게 고칠지는 모르겠지만 약간의 조정으로 할 수 있다고 확신합니다. 어떤 제안이라도?

awk_pattern 파일 :

BEGIN{ fn = "part1.txt"; n = 1 }
{
   print > fn
   if (substr($0,1,2) == "-|") {
       close (fn)
       n++
       fn = "part" n ".txt"
   }
}

bash 명령 :

awk -f awk_pattern input.file

다음은 구분 기호에서 제공하는 파일 이름을 기반으로 파일을 여러 파일로 분할하는 Python 3 스크립트입니다. 입력 파일 예 :

# Ignored

######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END

# Ignored

######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END

다음은 스크립트입니다.

#!/usr/bin/env python3

import os
import argparse

# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'

# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")

args = parser.parse_args()

# read the input file
with open(args.input_file, 'r') as input_file:
    input_data = input_file.read()

# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
    # discard lines until the next start delimiter
    while input_lines and not input_lines[0].startswith(start_delimiter):
        input_lines.pop(0)

    # corner case: no delimiter found and no more lines left
    if not input_lines:
        break

    # extract the output filename from the start delimiter
    output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
    output_path = os.path.join(args.output_dir, output_filename)

    # open the output file
    print("extracting file: {0}".format(output_path))
    with open(output_path, 'w') as output_file:
        # while we have lines left and they don't match the end delimiter
        while input_lines and not input_lines[0].startswith(end_delimiter):
            output_file.write("{0}\n".format(input_lines.pop(0)))

        # remove end delimiter if present
        if not input_lines:
            input_lines.pop(0)

마지막으로 실행 방법은 다음과 같습니다.

$ python3 script.py -i input-file.txt -o ./output-folder/

csplit가지고 있다면 사용하십시오 .

If you don't, but you have Python... don't use Perl.

Lazy reading of the file

Your file may be too large to hold in memory all at once - reading line by line may be preferable. Assume the input file is named "samplein":

$ python3 -c "from itertools import count
with open('samplein') as file:
    for i in count():
        firstline = next(file, None)
        if firstline is None:
            break
        with open(f'out{i}', 'w') as out:
            out.write(firstline)
            for line in file:
                out.write(line)
                if line == '-|\n':
                    break"

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )

and the formated version:

#!/bin/bash
cat FILE | (
  I=0;
  echo -n"">file0;
  while read line; 
  do
    echo $line >> file$I;
    if [ "$line" == '-|' ];
    then I=$[I+1];
      echo -n "" > file$I;
    fi;
  done;
)

Here is a perl code that will do the thing

#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
    print FO $_;
    if(/^-\|/)
    {
        close(FO);
        $cur++;
        open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
    }
}
close(FO);

This is the sort of problem I wrote context-split for: http://stromberg.dnsalias.org/~strombrg/context-split.html

$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
        -s specifies what regex should separate output files
        -n specifies how output files are named (default: numeric
        -z specifies how long numbered filenames (if any) should be
        -i include line containing separator in output files
        operations are always performed on stdin

참고URL : https://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter

'Nice programing' 카테고리의 다른 글

자바 스크립트 자산 내에서 Rails 도우미 메서드 사용 (0)	2020.11.04
CSS로 테두리 너비를 어떻게 설정합니까? (0)	2020.11.04
nginx 및 사이트에서 여러 웹 사이트 사용 가능 (0)	2020.11.04
node-request- "SSL23_GET_SERVER_HELLO : unknown protocol"오류가 발생합니다. (0)	2020.11.04
Yeoman 발전기를 업데이트하는 데 선호되는 방법은 무엇입니까? (0)	2020.11.04

현재글구분 기호에 따라 하나의 파일을 여러 파일로 분할

nicepro

구분 기호에 따라 하나의 파일을 여러 파일로 분할

구분 기호에 따라 하나의 파일을 여러 파일로 분할

Lazy reading of the file

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

티스토리툴바

구분 기호에 따라 하나의 파일을 여러 파일로 분할

구분 기호에 따라 하나의 파일을 여러 파일로 분할

Lazy reading of the file

'Nice programing' 카테고리의 다른 글

'Nice programing'의 다른글

관련글

티스토리툴바