It seems to me that what you need is to access the sample
wildcard using the wildcards
object:
rule all:
input: expand("out/{sample}_fastq.gz", sample = samples)
rule download_sample:
output:
"out/{sample}_fastq.gz"
params:
outdir = "out",
threads = 16
priority: 85
shell: "parallel-fastq-dump --sra-id {wildcards.sample} --threads {params.threads} --outdir {params.outdir} --gzip "
The first solution could be to use the run:
section of the rule instead of the shell:
. This allows you to employ python code:
rule download_sample:
#...
run:
for input_file in input:
shell(f "parallel-fastq-dump --sra-id {input_file} --threads {params.threads} --outdir {params.outdir} --gzip")
This straightforward solution however is not idiomatic. From what I can see, you have a one-to-one relationship between input samples and output files. In other words to produce one out/{sample}_fastq.gz
file you need a single {sample}
. The best solution would be to reduce your rule to the one that makes a single file:
rule download_sample:
input: "{sample}"
output: "out/{sample}_fastq.gz"
params:
outdir = "out",
threads = 16
priority: 85
shell: "parallel-fastq-dump --sra-id {input} --threads {params.threads} --outdir {params.outdir} --gzip "
Importantly, the wildcard names in input and output must be named identically. Most typically, the same wildcard is present in both input and output, but it is of course also possible to have wildcards only in the output but not the input section.,Because the number of output files is unknown beforehand, the checkpoint only defines an output directory. This time, instead of explicitly writing,In additon to a single wildcards argument, input functions can optionally take a groupid (with exactly that name) as second argument, see Group-local jobs for details.,A Snakemake workflow defines a data analysis in terms of rules that are specified in the Snakefile. Most commonly, rules consist of a name, input files, output files, and a shell command to generate the output from the input:
rule NAME:
input: "path/to/inputfile", "path/to/other/inputfile"
output: "path/to/outputfile", "path/to/another/outputfile"
shell: "somecommand {input} {output}"
rule NAME:
input: "path/to/inputfile", "path/to/other/inputfile"
output: "path/to/outputfile", somename = "path/to/another/outputfile"
run:
for f in input:
...
with open(output[0], "w") as out:
out.write(...)
with open(output.somename, "w") as out:
out.write(...)
shell("somecommand {output.somename}")
for line in shell("somecommand {output.somename}", iterable = True): ...# do something in python
rule complex_conversion:
input:
"{dataset}/inputfile"
output:
"{dataset}/file.{group}.txt"
shell:
"somecommand --group {wildcards.group} < {input} > {output}"
output: "{dataset,\d+}.{group}.txt"
We now tell Snakemake to make all these files by using the target rule name on the command line:,So far, we’ve told Snakemake what output files to generate by giving the names of the desired files on the command line. Often you want Snakemake to process all the available samples. How can we do this?,If you don’t specify a target rule name or any file names on the command line when running Snakemake, the default is to use the first rule in the Snakefile as the target. So if all_counts is defined at the top, before the other rules, you can simply say:,Giving the name of a rule to Snakemake on the command line only works when that rule has no wildcards in the outputs, because Snakemake has no way to know what the desired wildcards might be. You will see the error “Target rules may not contain wildcards.” This can also happen when you don’t supply any explicit targets on the command line at all, and Snakemake tries to run the first rule defined in the Snakefile.
$ mv reads original_reads $ mkdir reads $ cd reads $ ln - s.. / original_reads /* . $ rename -v -s ref ref_ * $ cd ..
# Input conditions and replicates to process CONDITIONS = ["ref", "etoh60", "temp33"] REPLICATES = ["1", "2", "3"]
rule all_counts:
input: expand("trimmed.{cond}_{rep}_1.fq.count", cond = CONDITIONS, rep = REPLICATES)
$ snakemake - j1 - p all_counts
$ snakemake - j1 - p
# Input conditions and replicates to process CONDITIONS = ["ref", "etoh60", "temp33"] REPLICATES = ["1", "2", "3"] READ_ENDS = ["1", "2"] COUNT_DIR = ["reads", "trimmed"] # Rule to make all counts at once rule all_counts: input: expand("{indir}.{cond}_{rep}_{end}.fq.count", indir = COUNT_DIR, cond = CONDITIONS, rep = REPLICATES, end = READ_ENDS)